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Preface 


This book was written for an introductory one-semester or two-quarter course 
in probability and statistics for students in engineering and applied sciences. No 
previous knowledge of probability or statistics is presumed but a good under¬ 
standing of calculus is a prerequisite for the material. 

The development of this book was guided by a number of considerations 
observed over many years of teaching courses in this subject area, including the 
following: 

• As an introductory course, a sound and rigorous treatment of the basic 
principles is imperative for a proper understanding of the subject matter 
and for confidence in applying these principles to practical problem solving. 
A student, depending upon his or her major field of study, will no doubt 
pursue advanced work in this area in one or more of the many possible 
directions. How well is he or she prepared to do this strongly depends on 
his or her mastery of the fundamentals. 

• It is important that the student develop an early appreciation for applica¬ 
tions. Demonstrations of the utility of this material in nonsuperficial applica¬ 
tions not only sustain student interest but also provide the student with 
stimulation to delve more deeply into the fundamentals. 

• Most of the students in engineering and applied sciences can only devote one 
semester or two quarters to a course of this nature in their programs. 
Recognizing that the coverage is time limited, it is important that the material 
be self-contained, representing a reasonably complete and applicable body of 
knowledge. 

The choice of the contents for this book is in line with the foregoing 
observations. The major objective is to give a careful presentation of the 
fundamentals in probability and statistics, the concept of probabilistic model¬ 
ing, and the process of model selection, verification, and analysis. In this text, 
definitions and theorems are carefully stated and topics rigorously treated 
but care is taken not to become entangled in excessive mathematical details. 
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Preface 


Practical examples are emphasized; they are purposely selected from many 
different fields and not slanted toward any particular applied area. The same 
objective is observed in making up the exercises at the back of each chapter. 

Because of the self-imposed criterion of writing a comprehensive text and 
presenting it within a limited time frame, there is a tight continuity from one 
topic to the next. Some flexibility exists in Chapters 6 and 7 that include 
discussions on more specialized distributions used in practice. For example, 
extreme-value distributions may be bypassed, if it is deemed necessary, without 
serious loss of continuity. Also, Chapter 11 on linear models may be deferred to 
a follow-up course if time does not allow its full coverage. 

It is a pleasure to acknowledge the substantial help I received from students 
in my courses over many years and from my colleagues and friends. Their 
constructive comments on preliminary versions of this book led to many 
improvements. My sincere thanks go to Mrs. Carmella Gosden, who efficiently 
typed several drafts of this book. As in all my undertakings, my wife, Dottie, 
cared about this project and gave me her loving support for which I am deeply 
grateful. 


T.T.Soong 
Buffalo, New York 
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Introduction 


At present, almost all undergraduate curricula in engineering and applied 
sciences contain at least one basic course in probability and statistical inference. 
The recognition of this need for introducing the ideas of probability theory in 
a wide variety of scientific fields today reflects in part some of the profound 
changes in science and engineering education over the past 25 years. 

One of the most significant is the greater emphasis that has been placed upon 
complexity and precision. A scientist now recognizes the importance of study¬ 
ing scientific phenomena having complex interrelations among their compon¬ 
ents; these components are often not only mechanical or electrical parts but 
also ‘soft-science’ in nature, such as those stemming from behavioral and social 
sciences. The design of a comprehensive transportation system, for example, 
requires a good understanding of technological aspects of the problem as well 
as of the behavior patterns of the user, land-use regulations, environmental 
requirements, pricing policies, and so on. 

Moreover, precision is stressed - precision in describing interrelationships 
among factors involved in a scientific phenomenon and precision in predicting 
its behavior. This, coupled with increasing complexity in the problems we face, 
leads to the recognition that a great deal of uncertainty and variability are 
inevitably present in problem formulation, and one of the mathematical tools 
that is effective in dealing with them is probability and statistics. 

Probabilistic ideas are used in a wide variety of scientific investigations 
involving randomness. Randomness is an empirical phenomenon characterized 
by the property that the quantities in which we are interested do not have 
a predictable outcome under a given set of circumstances, but instead there is 
a statistical regularity associated with different possible outcomes. Loosely 
speaking, statistical regularity means that, in observing outcomes of an exper¬ 
iment a large number of times (say n), the ratio nijn, where m is the number of 
observed occurrences of a specific outcome, tends to a unique limit as n 
becomes large. For example, the outcome of flipping a coin is not predictable 
but there is statistical regularity in that the ratio mjn approaches 5 for either 
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heads or tails. Random phenomena in scientific areas abound: noise in radio 
signals, intensity of wind gusts, mechanical vibration due to atmospheric dis¬ 
turbances, Brownian motion of particles in a liquid, number of telephone calls 
made by a given population, length of queues at a ticket counter, choice of 
transportation modes by a group of individuals, and countless others. It is not 
inaccurate to say that randomness is present in any realistic conceptual model 
of a real-world phenomenon. 


1.1 ORGANIZATION OF TEXT 

This book is concerned with the development of basic principles in constructing 
probability models and the subsequent analysis of these models. As in other 
scientific modeling procedures, the basic cycle of this undertaking consists of 
a number of fundamental steps; these are schematically presented in Figure 1.1. 
A basic understanding of probability theory and random variables is central to 
the whole modeling process as they provide the required mathematical machin¬ 
ery with which the modeling process is carried out and consequences deduced. 
The step from B to C in Figure 1.1 is the induction step by which the structure 
of the model is formed from factual observations of the scientific phenomenon 
under study. Model verification and parameter estimation (E) on the basis of 
observed data (D) fall within the framework of statistical inference. A model 



Figure 1.1 Basic cycle of probabilistic modeling and analysis 
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may be rejected at this stage as a result of inadequate inductive reasoning or 
insufficient or deficient data. A reexamination of factual observations or add¬ 
itional data may be required here. Finally, model analysis and deduction are 
made to yield desired answers upon model substantiation. 

In line with this outline of the basic steps, the book is divided into two parts. 
Part A (Chapters 2-7) addresses probability fundamentals involved in steps 
A ^ C, B ^ C, and E ^ F (Figure 1.1). Chapters 2-5 provide these funda¬ 
mentals, which constitute the foundation of all subsequent development. Some 
important probability distributions are introduced in Chapters 6 and 7. The 
nature and applications of these distributions are discussed. An understanding 
of the situations in which these distributions arise enables us to choose an 
appropriate distribution, or model, for a scientific phenomenon. 

Part B (Chapters 8-11) is concerned principally with step D ^ E (Figure 1.1), 
the statistical inference portion of the text. Starting with data and data repre¬ 
sentation in Chapter 8, parameter estimation techniques are carefully developed 
in Chapter 9, followed by a detailed discussion in Chapter 10 of a number of 
selected statistical tests that are useful for the purpose of model verification. In 
Chapter 11, the tools developed in Chapters 9 and 10 for parameter estimation 
and model verification are applied to the study of linear regression models, a very 
useful class of models encountered in science and engineering. 

The topics covered in Part B are somewhat selective, but much of the 
foundation in statistical inference is laid. This foundation should help the 
reader to pursue further studies in related and more advanced areas. 


1.2 PROBABILITY TABLES AND COMPUTER SOFTWARE 

The application of the materials in this book to practical problems will require 
calculations of various probabilities and statistical functions, which can be time 
consuming. To facilitate these calculations, some of the probability tables are 
provided in Appendix A. It should be pointed out, however, that a large 
number of computer software packages and spreadsheets are now available 
that provide this information as well as perform a host of other statistical 
calculations. As an example, some statistical functions available in Microsoft® 
ExceF'^ 2000 are listed in Appendix B. 


1.3 PREREQUISITES 

The material presented in this book is calculus-based. The mathematical pre¬ 
requisite for a course using this book is a good understanding of differential 
and integral calculus, including partial differentiation and multidimensional 
integrals. Familiarity in linear algebra, vectors, and matrices is also required. 


TLFeBOOK 



TLFeBOOK 



Part A 


Probability and Random Variables 


TLFeBOOK 



TLFeBOOK 



2 


Basic Probability Concepts 


The mathematical theory of probability gives us the basic tools for constructing 
and analyzing mathematical models for random phenomena. In studying a 
random phenomenon, we are dealing with an experiment of which the outcome 
is not predictable in advance. Experiments of this type that immediately come 
to mind are those arising in games of chance. In fact, the earliest development 
of probability theory in the fifteenth and sixteenth centuries was motivated by 
problems of this type (for example, see Todhunter, 1949). 

In science and engineering, random phenomena describe a wide variety of 
situations. By and large, they can be grouped into two broad classes. The first 
class deals with physical or natural phenomena involving uncertainties. Uncer¬ 
tainty enters into problem formulation through complexity, through our lack 
of understanding of all the causes and effects, and through lack of information. 
Consider, for example, weather prediction. Information obtained from satellite 
tracking and other meteorological information simply is not sufficient to permit 
a reliable prediction of what weather condition will prevail in days ahead. It is 
therefore easily understandable that weather reports on radio and television are 
made in probabilistic terms. 

The second class of problems widely studied by means of probabilistic 
models concerns those exhibiting variability. Consider, for example, a problem 
in traffic flow where an engineer wishes to know the number of vehicles cross¬ 
ing a certain point on a road within a specified interval of time. This number 
varies unpredictably from one interval to another, and this variability reflects 
variable driver behavior and is inherent in the problem. This property forces us 
to adopt a probabilistic point of view, and probability theory provides a 
powerful tool for analyzing problems of this type. 

It is safe to say that uncertainty and variability are present in our modeling of 
all real phenomena, and it is only natural to see that probabilistic modeling and 
analysis occupy a central place in the study of a wide variety of topics in science 
and engineering. There is no doubt that we will see an increasing reliance on the 
use of probabilistic formulations in most scientific disciplines in the future. 
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2.1 ELEMENTS OF SET THEORY 

Our interest in the study of a random phenomenon is in the statements we can 
make concerning the events that can occur. Events and combinations of events 
thus play a central role in probability theory. The mathematics of events is 
closely tied to the theory of sets, and we give in this section some of its basic 
concepts and algebraic operations. 

A set is a collection of objects possessing some common properties. These 
objects are called elements of the set and they can be of any kind with any 
specified properties. We may consider, for example, a set of numbers, a set of 
mathematical functions, a set of persons, or a set of a mixture of things. Capital 
letters A, B, C, $, O,... shall be used to denote sets, and lower-case letters 
a, b, c, (/), w,... to denote their elements. A set is thus described by its elements. 
Notationally, we can write, for example, 

^ = {1,2,3,4,5,61, 

which means that set A has as its elements integers 1 through 6. If set B contains 
two elements, success and failure, it can be described by 


where 5 and / are chosen to represent success and failure, respectively. For a set 
consisting of all nonnegative real numbers, a convenient description is 

C = jx : X > 0}. 

We shall use the convention 

a & A (2.1) 

to mean ‘element a belongs to set A\ 

A set containing no elements is called an empty or null set and is denoted by 0. 
We distinguish between sets containing a finite number of elements and those 
having an infinite number. They are called, respectively, finite sets and infinite 
sets. An infinite set is called enumerable or countable if all of its elements can be 
arranged in such a way that there is a one-to-one correspondence between them 
and all positive integers; thus, a set containing all positive integers 1, 2, ...is a 
simple example of an enumerable set. A nonenumerable or uncountable set is one 
where the above-mentioned one-to-one correspondence cannot be established. A 
simple example of a nonenumerable set is the set C described above. 

If every element of a set A is also an element of a set B, the set A is called 
a subset of B and this is represented symbolically by 

AcB or Bd A. (2.2) 


TLFeBOOK 



Basic Probability Concepts 


9 



Figure 2.1 Venn diagram for A C B 


Example 2.1. Let A = {2,4} and B= {1,2,3, 4} Then A <Z B, since every 
element of A is also an element of B. This relationship can also be presented 
graphically by using a Venn diagram, as shown in Figure 2.1. The set B 
occupies the interior of the larger circle and A the shaded area in the figure. 

It is clear that an empty set is a subset of any set. When both A C B and 
B C A, set A is then equal to B, and we write 


(2.3) 


A = B. 


We now give meaning to a particular set we shall call space. In our develop¬ 
ment, we consider only sets that are subsets of a fixed (nonempty) set. This 
‘largest’ set containing all elements of all the sets under consideration is called 
space and is denoted by the symbol S. 

Consider a subset A in S. The set of all elements in S that are not elements of 
A is called the complement of A, and we denote it by A. A Venn diagram 
showing A and A is given in Figure 2.2 in which space S is shown as a rectangle 
and A is the shaded area. We note here that the following relations clearly hold: 


S = 0, 0 = S, A = A. 


(2.4) 


2.1.1 SET OPERATIONS 

Let us now consider some algebraic operations of sets A, B, C ,... that are 
subsets of space S. 

The union or sum of A and B, denoted hy A LI B, is the set of all elements 
belonging to A or B or both. 



Figure 2.2 A and A 
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(a) A>uB 


(b) AnB 


Figure 2.3 (a) Union and (b) intersection of sets A and B 


The intersection or product of A and B, written as A n B, or simply AB, is the 
set of all elements that are common to A and B. 

In terms of Venn diagrams, results of the above operations are shown in 
Figures 2.3(a) and 2.3(b) as sets having shaded areas. 

If AB = 0, sets A and B contain no common elements, and we call A and B 
disjoint. The symbol ‘+’ shall be reserved to denote the union of two disjoint 
sets when it is advantageous to do so. 

Example 2.2. Let A be the set of all men and B consist of all men and women 
over 18 years of age. Then the set A U B consists of all men as well as all women 
over 18 years of age. The elements of A fl B are all men over 18 years of age. 

Example 2.3. Let S be the space consisting of a real-line segment from 0 to 10 
and let A and B be sets of the real-line segments from 1-7 and 3-9 respectively. 
Line segments belonging to A U B, A n B, A, and B are indicated in Figure 2.4. 
Let us note here that, by definition, a set and its complement are always disjoint. 

The definitions of union and intersection can be directly generalized to those 
involving any arbitrary number (finite or countably infinite) of sets. Thus, the set 

n 

Ai UA 2 ...UA,, = |JA; (2.5) 

./=i 



A 


A 


4 


AnB- 
- AkjB 


B 


J 


6 

8 

10 







Figure 2.4 Sets defined in Example 2.3 
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stands for the set of all elements belonging to one or more of the sets Aj, 
j = 1,2,... ,n. The intersection 

n 

AiA2...A„ = f|A; (2.6) 

./=i 

is the set of all elements common to all Aj, j = 1,2,... ,n. The sets 
Aj, j = 1,2,... ,n, are disjoint if 

AiAj = 0, for every i, j {i ^j). (2.7) 

U sing Venn diagrams or analytical procedures, it is easy to verify that union 
and intersection operations are associative, commutative, and distributive; that is. 


{ALIB)LIC = ALI{BUC) = AUBUC,' 
AU B = BU A, 

(AB)C = A(BC) = ABC, 

AB = BA, 

A(BUC) = (AB)U(AC). 

Clearly, we also have 

AU A = AA = A, ' 

AU0 = A, 

A0 = 0, 

AUS = S, 

AS = A, 

AUA = S, 

AA = 0. 


( 2 . 8 ) 


(2.9) 


Moreover, the following useful relations hold, all of which can be easily verified 
using Venn diagrams: 


AU(BC) = (AUB)(AUC), ' 
AUB = AU (AB) = A + (AB), 
(AUB) = AB, 

(AB) = AUB, 

( n \ n 

( n \ n 

=u^.- 

7=1 / 7=1 


( 2 . 10 ) 


TLFeBOOK 



12 


Fundamentals of Probability and Statistics for Engineers 


The second relation in Equations (2.10) gives the union of two sets in terms 
of the union of two disjoint sets. As we will see, this representation is useful in 
probability calculations. The last two relations in Equations (2.10) are referred 
to as DeMorgan’s laws. 


2.2 SAMPLE SPACE AND PROBABILITY MEASURE 

In probability theory, we are concerned with an experiment with an outcome 
depending on chance, which is called a random experiment. It is assumed that all 
possible distinct outcomes of a random experiment are known and that they are 
elements of a fundamental set known as the sample space. Each possible out¬ 
come is called a sample point, and an event is generally referred to as a subset of 
the sample space having one or more sample points as its elements. 

It is important to point out that, for a given random experiment, the 
associated sample space is not unique and its construction depends upon the 
point of view adopted as well as the questions to be answered. Eor example, 
100 n resistors are being manufactured by an industrial firm. Their values, 
owing to inherent inaccuracies in the manufacturing and measurement pro¬ 
cesses, may range from 99 to 101 fl. A measurement taken of a resistor is a 
random experiment for which the possible outcomes can be defined in a variety 
of ways depending upon the purpose for performing such an experiment. On 
the one hand, if, for a given user, a resistor with resistance range of 99.9-100.1 O 
is considered acceptable, and unacceptable otherwise, it is adequate to define 
the sample space as one consisting of two elements: ‘acceptable’ and ‘unaccept¬ 
able’. On the other hand, from the viewpoint of another user, possible 
outcomes may be the ranges 99-99.5 O, 99.5-100 O, 100-100.5 O, and 
100.5-101 O. The sample space in this case has four sample points. Einally, if 
each possible reading is a possible outcome, the sample space is now a real line 
from 99 to 101 on the ohm scale; there is an uncountably infinite number of 
sample points, and the sample space is a nonenumerable set. 

To illustrate that a sample space is not fixed by the action of performing the 
experiment but by the point of view adopted by the observer, consider an 
energy negotiation between the United States and another country. Erom the 
point of view of the US government, success and failure may be looked on as 
the only possible outcomes. To the consumer, however, a set of more direct 
possible outcomes may consist of price increases and decreases for gasoline 
purchases. 

The description of sample space, sample points, and events shows that they 
fit nicely into the framework of set theory, a framework within which the 
analysis of outcomes of a random experiment can be performed. All relations 
between outcomes or events in probability theory can be described by sets and 
set operations. Consider a space S of elements a, b, c,..., and with subsets 
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Table 2.1 Corresponding statements in set theory and probability 


Set theory 

Probability theory 

Space, S 

Sample space, sure event 

Empty set, 0 

Impossible event 

Elements a,b,... 

Sample points a, b,... (or simple events) 

Sets A,B,... 

Events A, B,... 

A 

Event A oceurs 

A 

Event A does not occur 

AUB 

At least one of A and B occurs 

AB 

Both A and B occur 

AcB 

A is a subevent of B (i.e. the occurrence of A necessarily implies 
the occurrence of B) 

AB=% 

A and B are mutually exclusive (i.e. they cannot occur 
simultaneously) 


A,B ,C,.... Some of these corresponding sets and probability meanings are 
given in Table 2.1. As Table 2.1 shows, the empty set 0 is considered an 
impossible event since no possible outcome is an element of the empty set. 
Also, by ‘occurrence of an event’ we mean that the observed outcome is an 
element of that set. For example, event B\s said to occur if and only if the 
observed outcome is an element of ^ or 5 or both. 

Example 2.4. Consider an experiment of counting the number of left-turning 
cars at an intersection in a group of 100 cars. The possible outcomes (possible 
numbers of left-turning cars) are 0,1,2,..., 100. Then, the sample space S is 
S = {0,1,2,..., 100}. Each element of 5 is a sample point or a possible out¬ 
come. The subset A = {0,1,2,..., 50} is the event that there are 50 or fewer 
cars turning left. The subset B — {40,41,..., 60} is the event that between 40 
and 60 (inclusive) cars take left turns. The set ^ U .S is the event of 60 or fewer 
cars turning left. The set ^ n 5 is the event that the number of left-turning cars 
is between 40 and 50 (inclusive). Let C = {80, 81,..., 100} Events A and C are 
mutually exclusive since they cannot occur simultaneously. Hence, disjoint sets 
are mutually exclusive events in probability theory. 


2.2.1 AXIOMS OF PROBABILITY 

We now introduce the notion of a probability function. Given a random experi¬ 
ment, a finite number P(A) is assigned to every event A in the sample space S of 
all possible events. The number P(A) is a function of set A and is assumed to 
be defined for all sets in S. It is thus a set function, and P(A) is called the 
probability measure of A or simply the probability of A. It is assumed to have the 
following properties (axioms of probability): 
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• Axiom 1: P(A} > 0 (nonnegative). 

• Axiom 2\ P(S} = \ (normed). 

• Axiom 3: for a countable collection of mutually exclusive events A i, A 2 ,... in 5, 

P{Ai UA 2 U...) =P (E4)=E P{Aj) (additive). (2-11) 

These three axioms define a countably additive and nonnegative set function 
P(A), A C 5. As we shall see, they constitute a sufficient set of postulates from 
which all useful properties of the probability function can be derived. Let us 
give below some of these important properties. 

First, P(0) = 0. Since S and 0 are disjoint, we see from Axiom 3 that 

P{S) = P{S + 0) = P{S) + P{%). 

It then follows from Axiom 2 that 

1 = 1+P(0) 


or 


P(0) = 0. 


Second, if A C C, then P(A} < P(C). Since A C C, one can write 


A + B=C, 


where B is a subset of C and disjoint with A. Axiom 3 then gives 
P(C) = P{A + B)= P{A) + P{B). 

Since P(B) > 0 as required by Axiom 1, we have the desired result. 
Third, given two arbitrary events A and B, we have 


P{A U B) = P{A) + P{B) - P{AB). 


( 2 . 12 ) 


In order to show this, let us write A U B in terms of the union of two 
mutually exclusive events. From the second relation in Equations (2.10), 
we write 


AUB = A + AB. 
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Figure 2.5 Venn diagram for derivation of Equation (2.12) 


Hence, using Axiom 3, 

P{A\JB) = P{A + AB) = P{A) + P(AB). (2.13) 

Furthermore, we note 


AB + AB = B. 


Hence, again using Axiom 3, 

P(AB) + P{AB) = P{B), 


or 


P{AB) = P{B) - P{AB). 


Substitution of this equation into Equation (2.13) yields Equation (2.12). 

Equation (2.12) can also be verified by inspecting the Venn diagram in Eigure 
2.5. The sum P(A) + P(B) counts twice the events belonging to the shaded 
area AB. Hence, in computing P(A\JB), the probability associated with 
one AB must be subtracted from P(A)+ P(B} giving Equation (2.12) (see 
Eigure 2.5). 

The important result given by Equation (2.12) can be immediately general¬ 
ized to the union of three or more events. Using the same procedure, we can 
show that, for arbitrary events A, B, and C, 


P{AUBLIC)= P{A) + P{B) + P{C)- P{AB) - P{AC) 
-P{BC) + P{ABC). 


(2.14) 
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and, in the case of n events, 



n n n n n n 

=E -EE )+E E E p^AiAjAk) 

y—1 /—I 7—2 /—I 7—2 ^—3 

i<j i<j<k 




(2.15) 


where Aj, j = 1,2,... ,n, are arbitrary events. 

Example 2.5. Let us go back to Example 2.4 and assume that probabilities 
P(A), P(B), and P(C) are known. We wish to compute P(A U B) and P(A U C). 

Probability P(A U C), the probability of having either 50 or fewer cars turn¬ 
ing left or between 80 to 100 cars turning left, is simply P(A) + P(C) This 
follows from Axiom 3, since A and C are mutually exclusive. However, 
P(A U B), the probability of having 60 or fewer cars turning left, is found from 

P{A UB) = P{A) + P{B) - P{AB) 

The information given above is thus not sufficient to determine this probability 
and we need the additional information, P(AB), which is the probability of 
having between 40 and 50 cars turning left. 

With the statement of three axioms of probability, we have completed the 
mathematical description of a random experiment. It consists of three funda¬ 
mental constituents: a sample space S, a collection of events A, B, ..., and the 
probability function P. These three quantities constitute a probability space 
associated with a random experiment. 


2.2.2 ASSIGNMENT OF PROBABILITY 

The axioms of probability define the properties of a probability measure, which are 
consistent with our intuitive notions. However, they do not guide us in assigning 
probabilities to various events. For problems in applied sciences, a natural way to 
assign the probability of an event is through the observation of relative frequency. 
Assuming that a random experiment is performed a large number of times, say n, 
then for any event A let ha be the number of occurrences of A in the n trials and 
define the ratio n^/n as the relative frequency of A. Under stable or statistical 
regularity conditions, it is expected that this ratio will tend to a unique limit as n 
becomes large. This limiting value of the relative frequency clearly possesses the 
properties required of the probability measure and is a natural candidate for 
the probability of A. This interpretation is used, for example, in saying that the 
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probability of ‘heads’ in flipping a coin is 1/2. The relative frequency approach to 
probability assignment is objective and consistent with the axioms stated in Section 
2.2.1 and is one commonly adopted in science and engineering. 

Another common but more subjective approach to probability assignment is 
that of relative likelihood. When it is not feasible or is impossible to perform an 
experiment a large number of times, the probability of an event may be assigned 
as a result of subjective judgement. The statement ‘there is a 40% probability of 
rain tomorrow’ is an example in this interpretation, where the number 0.4 is 
assigned on the basis of available information and professional judgement. 

In most problems considered in this book, probabilities of some simple but 
basic events are generally assigned by using either of the two approaches. Other 
probabilities of interest are then derived through the theory of probability. 
Example 2.5 gives a simple illustration of this procedure where the probabilities 
of interest, P(A [J B) and P(A U Cj, are derived upon assigning probabilities to 
simple events A, B, and C. 


2.3 STATISTICAL INDEPENDENCE 


Let us pose the following question; given individual probabilities P(A) and P(B} 
of two events A and B, what is P(AB), the probability that both A and B will 
occur? Upon little reflection, it is not difficult to see that the knowledge of P(A) 
and P(B) is not sufficient to determine P(AB) in general. This is so because 
P(AB) deals with joint behavior of the two events whereas P(A) and P(B) are 
probabilities associated with individual events and do not yield information on 
their joint behavior. Let us then consider a special case in which the occurrence 
or nonoccurrence of one does not affect the occurrence or nonoccurrence of the 
other. In this situation events A and B are called statistically independent or 
simply independent and it is formalized by Definition 2.1. 

Definition 2.1. Two events A and B are said to be independent if and only if 


P{AB) = P{A)P{B). 


(2.16) 


To show that this definition is consistent with our intuitive notion of inde¬ 
pendence, consider the following example. 

Example 2.6. In a large number of trials of a random experiment, let nA and 
ns be, respectively, the numbers of occurrences of two outcomes A and B, and 
let nAB be the number of times both A and B occur. U sing the relative frequency 
interpretation, the ratios ha/w and ns/n tend to P(A) and P(B), respectively, as n 
becomes large. Similarly, nAs/n tends to P(AB). Let us now confine our atten¬ 
tion to only those outcomes in which A is realized. If A and B are independent. 
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we expect that the ratio also tends to P(B} as becomes large. The 

independence assumption then leads to the observation that 


This then gives 


— ^ P{B) 
nA 




n 



or, in the limit as n becomes large, 

P{AB) = P{A)P{B), 

which is the definition of independence introduced above. 

Example 2.7. In launching a satellite, the probability of an unsuccessful 
launch is q. What is the probability that two successive launches are unsuccess¬ 
ful? Assuming that satellite launchings are independent events, the answer to 
the above question is simply q^. One can argue that these two events are not 
really completely independent, since they are manufactured by using similar 
processes and launched by the same launcher. It is thus likely that the failures of 
both are attributable to the same source. However, we accept this answer as 
reasonable because, on the one hand, the independence assumption is accept¬ 
able since there are a great deal of unknowns involved, any of which can be 
made accountable for the failure of a launch. On the other hand, the simplicity 
of computing the joint probability makes the independence assumption attract¬ 
ive. In physical problems, therefore, the independence assumption is often 
made whenever it is considered to be reasonable. 

Care should be exercised in extending the concept of independence to more 
than two events. In the case of three events, A i, A 2 , and A 3 , for example, they 
are mutually independent if and only if 


P{AjAk) = P{Aj)P{Ak), (2.17) 


and 


P{AiA2A,) = P{Ai)P{A2)P{A,). (2.18) 

Equation (2.18) is required because pairwise independence does not generally 
lead to mutual independence. Consider, for example, three events Ai, A 2 , and 
A 3 defined by 

A\ = B\\J B 2 , A 2 = Bi U Bt,, A 3 = ^2 U ^ 3 , 
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where Bi, B 2 , and B 3 are mutually exclusive, each occurring with probability i. 
It is easy to calculate the following; 

P{Ai) = P{Bi U B2) = PiBi) + P{B 2 ) = ^, 

P{A2) = P{A2) 

P{A,A2) = P[{By U B2) n ( 5 i u 53)] = P{B,) = 

P{A,A2) = P{A2A2) = 

p{a,A 2A2) = p[{Bi u B2) n {B, u £3) n {B2 u £3)] = p{^) = 0. 


We see that Equation (2.17) is satisfied for every j and k in this case, but 
Equation (2.18) is not. In other words, events Ai, A 2 , and A 3 are pairwise 
independent but they are not mutually independent. 

In general, therefore, we have Definition 2.2 for mutual independence of 
n events. 

Definition 2.2. Events A 1 , A 2 ,. .., A„ are mutually independent if and only if, 
with k\,k 2 , ■ ■ ■ ,km being any set of integers such that 1 <k\ < k 2 . ■ ■< km <n 
and m = 2, 3,..., n. 


P{Ak,Ak, .■■AkJ = P{Ak,)P{Ak,). ..P{AkJ. 


(2.19) 


The total number of equations defined by Equation (2.19) is 2" — n — 1. 

Example 2.8. Problem: a system consisting of five components is in working 
order only when each component is functioning (‘good’). Let 5,, i = 1,..., 5, be 
the event that the ith component is good and assume P{Si) = pi. What is the 
probability q that the system fails? 

Answer: assuming that the five components perform in an independent 
manner, it is easier to determine q through finding the probability of system 
success p. We have from the statement of the problem 


P = P{SiS2S2SaS5). 

Equation (2.19) thus gives, due to mutual independence of 5i, ^ 2 ,..., Ss, 


p = P{S\)P{S 2 ) ■ . . P{Ss) = P\P2P3P4P5- 


Hence, 


q = I- p= I -PIP2P3P4P5- 


( 2 . 20 ) 

( 2 . 21 ) 
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An expression for q may also be obtained by noting that the system fails if 
any one or more of the five components fail, or 

^ = P(S ^1 uS^2U53U54U A 5 ), ( 2 . 22 ) 

where 5, is the complement of 5, and represents a bad /th component. Clearly, 
P{Si)=\—pi. Since events 5,, / = 1,..., 5, are not mutually exclusive, the 
calculation of q with use of Equation (2.22) requires the use of Equation (2.15). 
Another approach is to write the unions in Equation (2.22) in terms of unions of 
mutually exclusive events so that Axiom 3 (Section 2.2.1) can be directly utilized. 
The result is, upon applying the second relation in Equations (2.10), 


S'! U 52 U 53 U A 4 U S's = 5'i + S 1 S 2 + S 1 S 2 S 2 + S 1 S 2 S 2 S 4 + S 1 S 2 S 2 S 4 SS, 

where the ‘U’ signs are replaced by ‘+’ signs on the right-hand side to stress the 
fact that they are mutually exclusive events. Axiom 3 then leads to 

q = P(5i) + P(5 i52) + P(5i 5253) + P(5i 535354 ) + P(5i 53535453 ), 

and, using statistical independence, 

? = (1 -Pi) +Pi(l -Pi) +PiP2(l -ps) +PiP2P3(l -Pa) ^2 23) 

+ PlP2P3P4(l -Ps) 

Some simple algebra will show that this result reduces to Equation (2.21). 

Let us mention here that probability p is called the reliability of the system in 
systems engineering. 


2.4 CONDITIONAL PROBABILITY 

The concept of conditional probability is a very useful one. Given two events A 
and B associated with a random experiment, probability P{A\B) is defined as 
the conditional probability of A, given that B has occurred. Intuitively, this 
probability can be interpreted by means of relative frequencies described in 
Example 2.6, except that events A and B are no longer assumed to be independ¬ 
ent. The number of outcomes where both A and B occur is nAs- Hence, given 
that event B has occurred, the relative frequency of A is then nAsInB- Thus we 
have, in the limit as ns becomes large, 

hab / flB ^ P{AB) 

^ ~ ne n ! n~ P{B) 

This relationship leads to Definition 2.3. 
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Definition 2.3. The conditional probability of A given that B has occurred is 
given by 


P{A\B) 


P{AB) 

IW 


P{B) + 0. 


(2.24) 


Definition 2.3 is meaningless if P(B) = 0. 

It is noted that, in the discussion of conditional probabilities, we are dealing 
with a contracted sample space in which B is known to have occurred. In other 
words, B replaces S as the sample space, and the conditional probability P(A\B) 
is found as the probability of A with respect to this new sample space. 

In the event that A and B are independent, it implies that the occurrence of B 
has no effect on the occurrence or nonoccurrence of A. We thus expect 
P{A\B) = P{A), and Equation (2.24) gives 


or 


P{A) 


P{AB) 

m ’ 


P(AB) = P{A)P{B), 


which is precisely the definition of independence. 

It is also important to point out that conditional probabilities are probabilities 
(i.e. they satisfy the three axioms of probability). Using Equation (2.24), we see that 
the first axiom is automatically satisfied. Eor the second axiom we need to show that 

P{S\B) = 1. 


This is certainly true, since 


P{S\B) 


P{SB) 

IW 


m 

m 


As for the third axiom, if A i, A 2 ,... are mutually exclusive, then A iB, A 2 B,... 
are also mutually exclusive. Hence, 


P[(AiUA2U...)|S] 


P[{A,AA2A...)B\ 

P{B) 

P{AiB\JA2B\J ...) 

- W) 

_ P{A,B) P{A2B) 

P{B) P{B) 

= P{Ai\B) + P{A2\B) + ■ ■ ■, 
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and the third axiom holds. 

The definition of conditional probability given by Equation (2.24) can be 
used not only to compute conditional probabilities but also to compute joint 
probabilities, as the following examples show. 

Example 2.9. Problem; let us reconsider Example 2.8 and ask the following 
question; what is the conditional probability that the first two components are 
good given that (a) the first component is good and (b) at least one of the two 
is good? 

Answer; the event 5 1^2 means both are good components, and Si U ^2 is the 
event that at least one of the two is good. Thus, for question (a) and in view of 
Equation (2.24), 


P{SiS2\Si) 


P{SiS2Si) 

PiSi) 


P{SiS2) _PlP2 
PiSi) Pi 


This result is expected since and S 2 are independent. Intuitively, we see that 
this question is equivalent to one of computing P{S 2 )- 
Eor question (b), we have 


P{SiS2\Si^S2) 


P[SiS2{Si^S2)] 
P{Si U ^ 2 ) 


u 


Now, 5 'i5 '2(5 i U S 2 ) = SiS 2 . Hence, 


P(Si^2|^iU52) 


P(^i52) P{SiS2) 

P{Si\JS2) P{Si) + P{S2) - P{SiS2) 

PIP 2 

Pi +P 2 -P 1 P 2 ' 


Example 2.10. Problem; in a game of cards, determine the probability of 
drawing, without replacement, two aces in succession. 

Answer; let A i be the event that the first card drawn is an ace, and similarly 
for A 2 . We wish to compute P(AiA 2 ). Prom Equation (2.24) we write 


P{AiA2) = P{A2\Ai)P{Ai). (2.25) 

Now, P{Ai) = 4/52 and P(A 2 \Ai) = 3/51 (there are 51 cards left and three of 
them are aces). Therefore, 

= ^ (^) =^. 
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Equation (2.25) is seen to be useful for finding joint probabilities. Its exten¬ 
sion to more than two events has the form 


F(AiA 2 . ..A„) = F(Ai)F(A2lAi)F(A3lAiA2)... F(A„lAiA2 ■ • (2.26) 

where P(A,) > 0 for all i. This can be verified by successive applications of 
Equation (2.24). 

In another direction, let us state a useful theorem relating the probability of 
an event to conditional probabilities. 

Theorem 2.1: theorem of total probability. Suppose that events Bi, B 2 , ■ ■ •, and 
B„ are mutually exclusive and exhaustive (i.e. S — Bi + B 2 + • ■ ■ + B„). Then, 
for an arbitrary event A, 


F{A) = F{A\Bi)F{Bi) + F{A\B 2 )F{B 2 ) + • • • + F{A\B„)F{B„) 


./=i 


(2.27) 


Proof of Theorem 2.1: referring to the Venn diagram in Eigure 2.6, we can 
clearly write A as the union of mutually exclusive events AB 1 , AB 2 , ■ ■ ■ , AB„ (i.e. 
A = AB\ + AB 2 + • • • + AB„). Hence, 

F{A) = B(ABi) -f F{AB2) + • • • + B(AB„), 

which gives Equation (2.27) on application of the definition of conditional 
probability. 



Figure 2.6 Venn diagram associated with total probability 
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The utility of this result rests with the fact that the probabilities in the sum 
in Equation (2.27) are often more readily obtainable than the probability of A itself. 

Example 2.11. Our interest is in determining the probability that a critical 
level of peak flow rate is reached during storms in a storm-sewer system on the 
basis of separate meteorological and hydrological measurements. 

Let Bi, i = 1,2, 3, be the different levels (low, medium, high) of precipitation 
caused by a storm and let AjJ = 1,2, denote, respectively, critical and non- 
critical levels of peak flow rate. Then probabilities P(B,) can be estimated from 
meteorological records and P(Aj\Bi) can be estimated from runoff analysis. 
Since B\,B 2 , and B 3 constitute a set of mutually exclusive and exhaustive 
events, the desired probability, P(Ai), can be found from 

E(Ai) = P(Ai|5i)P(Si) + P{A,\B2)P{B2) + P{Ai\B2)P{B2). 

Assume the following information is available: 


P(Pi)=0.5, P(P2)=0.3, P(P3) = 0.2, 


and that P(A/|P;) are as shown in Table 2.2. The value of P(Ai) is given by 

p(Ai) = 0(0.5) + 0.2(0.3) + 0.6(0.2) = 0.18. 

Let us observe that in Table 2.2, the sum of the probabilities in each column is 
1.0 by virtue of the conservation of probability. There is, however, no such 
requirement for the sum of each row. 

A useful result generally referred to as Bayes’ theorem can be derived based 
on the definition of conditional probability. Equation (2.24) permits us to write 

P{AB) = P{A\B)P{B) 


and 


P{BA) = P{B\A)P{A). 

Since P{AB) = P{BA), we have Theorem 2.2. 

Table 2.2 Probabilities P(A,|P,)> for Example 2.11 




Bi 



P| 

B 2 

P 3 

Ai 

0.0 

0.2 

0.6 

A 2 

1.0 

0.8 

0.4 
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Theorem 2.2: Bayes’ theorem. Let A and B be two arbitrary events with 
P{A) 7 ^ 0 and P{B) ^ 0. Then; 


P{B\A) 


P{A\B)P{B) 

P{A) 


(2.28) 


Combining this theorem with the total probability theorem we have a useful 
consequence: 


P{Bi\A) = P{A\Bi)P{Bi) j ^ [P{A\Bj)P{Bj)\. (2.29) 

for any i where events Bj represent a set of mutually exclusive and exhaustive 
events. 

The simple result given by Equation (2.28) is called Bayes’ theorem after the 
English philosopher Thomas Bayes and is useful in the sense that it permits us 
to evaluate a posfer/on probability P{B\A) in terms of a priori information P{B) 
and P{A\B), as the following examples illustrate. 

Example 2.12. Problem: a simple binary communication channel carries 
messages by using only two signals, say 0 and 1. We assume that, for a given 
binary channel, 40% of the time a 1 is transmitted; the probability that a 
transmitted 0 is correctly received is 0.90, and the probability that a transmitted 
1 is correctly received is 0.95. Determine (a) the probability of a 1 being 
received, and (b) given a 1 is received, the probability that 1 was transmitted. 
Answer: let 

A = event that 1 is transmitted, 

A = event that 0 is transmitted, 

B = event that 1 is received, 

B = event that 0 is received. 

The information given in the problem statement gives us 

P{A) = 0.4, P(A) = 0.6; 

PiB\A) = 0.95, P(B\A) = 0.05; 

P(5|Z)=0.90, P{B\A)=0A0. 

and these are represented diagrammatically in Eigure 2.7. 

Eor part (a) we wish to find P(B). Since A and A are mutually exclusive and 
exhaustive, it follows from the theorem of total probability [Equation (2.27)] 
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Figure 2.7 Probabilities associated with a binary channel, for Example 2.12 


that 


P{B) = P{B\A)P{A) + P{B\A)P{A) = 0.95(0.4) + 0.1(0.6) = 0.44. 


The probability of interest in part (b) is P{A\B), and this can be found using 
Bayes’ theorem [Equation (2.28)]. It is given by: 


P{A\B) 


P{B\A)P{A) 

P{B) 


0.95(0.4) 

0.44 


0.863. 


It is worth mentioning that P(B) 'm this calculation is found by means of the 
total probability theorem. Hence, Equation (2.29) is the one actually used here 
in finding P{A\B). In fact, probability P{A) in Equation (2.28) is often more 
conveniently found by using the total probability theorem. 

Example 2.13. Problem: from Example 2.11, determine P{B 2 \A 2 ), the probabil¬ 
ity that a noncritical level of peak flow rate will be caused by a medium-level storm. 

Answer: from Equations (2.28) and (2.29) we have 


P{B2\A2) 


P{A2\B2)P{B2) 

PiM) 


P{A2\B2)P{B2) 


P{A2\B,)P{B,) + P{A2\B2)P{B2) + P{A2\B2)P{B2) 
0.8(0.3) 


1.0(0.5)-F0.8(0.3)-F0.4(0.2) 


= 0.293. 


In closing, let us introduce the use of tree diagrams for dealing with more 
complicated experiments with ‘limited memory’. Consider again Example 2.12 
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Figure 2.8 A two-stage binary channel 

by adding a second stage to the communication channel, with Figure 2.8 
showing all the associated probabilities. We wish to determine P(C), the prob¬ 
ability of receiving a 1 at the second stage. 

Tree diagrams are useful for determining the behavior of this system when the 
system has a ‘one-stage’ memory; that is, when the outcome at the second stage is 
dependent only on what has happened at the first stage and not on outcomes at 
stages prior to the first. Mathematically, it follows from this property that 

P{C\BA) = P{C\B), P(C\BA) = P(C\B), etc. (2.30) 

The properties described above are commonly referred to as Markovian 
properties. Markov processes represent an important class of probabilistic 
process that are studied at a more advanced level. 

Suppose that Equations (2.30) hold for the system described in Figure 2.8. 
The tree diagram gives the flow of conditional probabilities originating from 
the source. Starting from the transmitter, the tree diagram for this problem has 
the appearance shown in Figure 2.9. The top branch, for example, leads to the 
probability of the occurrence of event ABC, which is, according to Equations 
(2.26) and (2.30), 


P{ABC) = P{A)P{B\A)P{C\BA) 

= P{A)P{B\A)P{C\B) 

= 0.4(0.95)(0.95) = 0.361. 


The probability of C is then found by summing the probabilities of all events 
that end with C. Thus, 

P{C) = P{ABC) + P{ABC) + P(ABC) + P(ABC) 

= 0.95(0.95)(0.4) -f- 0.1(0.05)(0.4) + 0.95(0.1)(0.6) + 0.1(0.9)(0.6) 

= 0.472. 
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PROBLEMS 

2.1 Let A, B, and C be arbitrary sets. Determine which of the following relations are 
correct and which are incorrect: 

(a) ABC = AB(C U B). 

(b) AB = AUB. 

(c) AUB = AB. 

(d) {AU B)C = ABC. 
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(e) AB 

(f) (AB)(AC) = %. 

2.2 The second relation in Equations (2.10) expresses the union of two sets as the union 
of two disjoint sets (i.e. A\J B = A + AB). Express .4 U i? U C in terms of the union 
of disjoint sets where A, B, and C are arbitrary sets. 

2.3 Verify DeMorgan’s laws, given by the last two equations of Equations (2.10). 

2.4 Let 5 = {1,2,..., 10}, = {1,3, 5}, .8= {1,4,6}, and C={2,5,7}. Determine 

elements of the following sets: 

(a) 5UC. 

(h) AUB. 

(c) AC. 

(d) ,4U(.8C). 

(e) ABC. 

(f) 

(g) (.4.8) U (.SC) U (C,4). 

2.5 Repeat Problem 2.4 if S= {x::0 < x < 10}, A = {x:l < x < 5}, B= {x-.\ < x < 6}, 
and C = {x:2 < x < 7}. 

2.6 Draw Venn diagrams of events A and B representing the following situations: 

(a) A and B are arbitrary. 

(b) If A occurs, B must occur. 

(c) If A occurs, B cannot occur. 

(d) A and B are independent. 

2.7 Let A, B, and C be arbitrary events. Eind expressions for the events that of A, B, C: 

(a) None occurs. 

(b) Only A occurs. 

(c) Only one occurs. 

(d) At least one occurs. 

(e) A occurs and either 8 or C occurs but not both. 

(f) B and C occur, hut A does not occur. 

(g) Two or more occur. 

(h) At most two occur. 

(i) All three occur. 

2.8 Events A, B, and C are independent, with P(A) = a, P(B) = h, and P(C) = c. 
Determine the following probabilities in terms of a, b, and c: 

(a) P(AB). 

(b) P(A[JB). 

(c) P(AUB\B). 

(d) PiAUB\C). 

2.9 An engineering system has two components. Let us define the following events: 

A : first component is good;A: first component is defective. 

B : second component is good;8: second component is defective: 

Describe the following events in terms of A, A, 8, and 8: 

(a) At least one of the components is good. 

(b) One is good and one is defective. 
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2.10 For the two components described in Problem 2.9, tests have produced the follow¬ 
ing result: 

P(A) = 0.8, F(BIA) = 0.85, P(5|Z) = 0.75. 

Determine the probability that: 

(a) The second component is good. 

(b) At least one of the components is good. 

(c) The first component is good given that the second is good. 

(d) The first component is good given that at most one component is good. 

For the two events A and B: 

(e) Are they independent? Verify your answer. 

(f) Are they mutually exclusive? Verify your answer. 

2.11 A satellite can fail for many possible reasons, two of which are computer failure 
and engine failure. For a given mission, it is known that: 

The probability of engine failure is 0.008. 

The probability of computer failure is 0.001. 

Given engine failure, the probability of satellite failure is 0.98. 

Given computer failure, the probability of satellite failure is 0.45. 

Given any other component failure, the probability of satellite failure is zero. 

(a) Determine the probability that a satellite fails. 

(b) Determine the probability that a satellite fails and is due to engine 
failure. 

(c) Assume that engines in different satellites perform independently. Given a 
satellite has failed as a result of engine failure, what is the probability that 
the same will happen to another satellite? 

2.12 Verify Equation (2.14). 

2.13 Show that, for arbitrary events Ai,A 2 , ..., A„, 

P(Ai UA2U...UA„)< P(Ai) + P(A2) + • • • + P(A„) 

This is known as Boole’s inequality. 

2.14 A box contains 20 parts, of which 5 are defective. Two parts are drawn at random 
from the box. What is the probability that: 

(a) Both are good? 

(b) Both are defective? 

(c) One is good and one is defective? 

2.15 An automobile braking device consists of three subsystems, all of which must work 
for the device to work. These systems are an electronic system, a hydraulic system, 
and a mechanical activator. In braking, the reliabilities (probabilities of success) of 
these units are 0.96, 0.95, and 0.95, respectively. Estimate the system reliability 
assuming that these subsystems function independently. 

Comment’, systems of this type can be graphically represented as shown in 
Figure 2.10, in which subsystems A (electronic system), B (hydraulic system), and 
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ABC 



Figure 2.10 Figure for Problem 2.15 


A 



C (mechanical activator) are arranged in series. Consider the path a —> b as the 
‘path to success’. A breakdown of any or all of A, B, or C will block the path from 
a to b. 

2.16 A spacecraft has 1000 components in series. If the required reliahility of the 
spacecraft is 0.9 and if all components function independently and have the same 
reliahility, what is the required reliability of each component? 

2.17 Automobiles are equipped with redundant braking circuits; their brakes fail only 
when all circuits fail. Consider one with two redundant braking circuits, each 
having a reliability of 0.95. Determine the system reliability assuming that these 
circuits act independently. 

Comment’, systems of this type are graphically represented as in Figure 2.11, in 
which the circuits (A and B) have a parallel arrangement. The path to success is 
broken only when breakdowns of both A and B occur. 

2.18 On the basis of definitions given in Problems 2.15 and 2.17 for series and parallel 
arrangements of system components, determine reliabilities of the systems 
described by the block diagrams as follows. 

(a) The diagram in Figure 2.12. 

(b) The diagram in Figure 2.13. 


TLFeBOOK 



















32 


Fundamentals of Probability and Statistics for Engineers 


B 



Figure 2.12 Figure for Problem 2.18(a) 



Figure 2.13 Figure for Problem 2.18(b) 


2.19 A rifle is fired at a target. Assuming that the probability of scoring a hit is 0.9 for 
each shot and that the shots are independent, compute the probability that, in 
order to score a hit: 

(a) It takes more than two shots. 

(b) The number of shots required is between four and six (inclusive). 

2.20 Events A and B are mutually exclusive. Can they also be independent? Explain. 

2.21 Let P{A) = 0.4, and P{A U B) = 0.7. What is P{B) if: 

(a) A and B are independent? 

(b) A and B are mutually exclusive? 

2.22 Let P{A U B) = 0.75, and P{AB) = 0.25. Is it possible to determine P(A) and P(B)1 
Answer the same question if, in addition: 

(a) A and B are independent. 

(b) A and B are mutually exclusive. 
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2.23 Events A and B are mutually exclusive. Determine which of the following relations 
are true and which are false: 

(a) P(AIB) = P(A). 

(b) P(AU BjC) = P(AIC) + P(BIC). 

(c) P(A) = 0, P(B) = 0, or both. 

P(A\B) ^ P(B\A) 

' ’ P(B) P( A) ■ 

(e) P{AB) = P{A)P{B). 

Repeat the above if events A and B are independent. 

2.24 On a stretch of highway, the probability of an accident due to human error in any 
given minute is 10^^, and the probability of an accident due to mechanical break¬ 
down in any given minute is 10^’. Assuming that these two causes are independent: 

(a) Find the probability of the occurrence of an accident on this stretch of highway 
during any minute. 

(b) In this case, can the above answer be approximated by Pfaccident due to 
human error) +P(accident due to mechanical failure)? Explain. 

(c) If the events in succeeding minutes are mutually independent, what is the 
probability that there will be no accident at this location in a year? 

2.25 Rapid transit trains arrive at a given station every five minutes and depart after 
stopping at the station for one minute to drop off and pick up passengers. Assum¬ 
ing trains arrive every hour on the hour, what is the probability that a passenger 
will be able to board a train immediately if he or she arrives at the station at a 
random instant between 7:54 a.m. and 8:06 a.m.? 

2.26 A telephone call occurs at random in the interval (0, t). Let T be its time of 

occurrence. Determine, where Q < < ti < f. 

(a) P(to <T < h). 

(b) P{h <T<h\T>to). 

2.27 For a storm-sewer system, estimates of annual maximum flow rates (AMFR) and 
their likelihood of occurrence [assuming that a maximum of 12 cfs (cubic feet per 
second) is possible] are given as follows: 

Event A = (5 to lOcfs), R(A) = 0.6. 

Event P= (8 to 12cfs), P(B) = 0.6. 

Event C = ^ UR, P(C) = 0.7. 

Determine: 

(a) P(8 < AMFR < 10), the probability that the AMFR is between 8 and 10 cfs. 

(b) P(5 < AMFR < 12). 

(c) P(10 < AMFR < 12). 

(d) P(8 < AMFR < 10|5 < AMFR < 10). 

(e) P(5 < AMFR < lojAMFR > 5). 

2.28 At a major and minor street intersection, one finds that, out of every 100 gaps on 
the major street, 65 are acceptable, that is, large enough for a car arriving on the 
minor street to cross. When a vehicle arrives on the minor street: 

(a) What is the probability that the first gap is not an acceptable one? 

(b) What is the probability that the first two gaps are both unacceptable? 

(c) The first car has crossed the intersection. What is the probability that the 
second will be able to cross at the very next gap? 


TLFeBOOK 



34 


Fundamentals of Probability and Statistics for Engineers 


2.29 A machine part may be selected from any of three manufacturers with probabilities 
Pi = 0.25,/)2 = 0.50, and p-^ = 0.25. The probabilities that it will function properly 
during a specified period of time are 0.2, 0.3, and 0.4, respectively, for the three 
manufacturers. Determine the probability that a randomly chosen machine part 
will function properly for the specified time period. 

2.30 Consider the possible failure of a transportation system to meet demand during 
rush hour. 

(a) Determine the probability that the system will fail if the probabilities shown in 
Table 2.3 are known. 


Table 2.3 Probabilities of demand levels and of system 
failures for the given demand level, for Problem 2.30 


Demand level 

P (level) 

P(system failure|level) 

Low 

0.6 

0 

Medium 

0.3 

0.1 

High 

0.1 

0.5 


(b) If system failure was observed, find the probability that a ‘medium’ demand 
level was its cause. 

2.31 A cancer diagnostic test is 95% accurate both on those who have cancer and on 
those who do not. If 0.005 of the population actually does have cancer, compute 
the probability that a particular individual has cancer, given that the test indicates 
he or she has cancer. 

2.32 A quality control record panel of transistors gives the results shown in Table 2.4 
when classified by manufacturer and quality. 

Let one transistor be selected at random. What is the probability of it being: 

(a) From manufacturer A and with acceptable quality? 

(b) Acceptable given that it is from manufacturer C? 

(c) From manufacturer B given that it is marginal? 


Table 2.4 Quality control results, for Problem 2.32 


Manufacturer 


Quality 


Acceptable 

Marginal 

Unacceptable 

Total 

A 

128 

10 

2 

140 

B 

97 

5 

3 

105 

C 

110 

5 

5 

120 


2.33 Verify Equation (2.26) for three events. 

2.34 In an elementary study of synchronized traffic lights, consider a simple four-light 
system. Suppose that each light is red for 30 seconds of a 50-second cycle, and suppose 

P(5,+i|S,) = 0.15 

and 

P(5:;+i|S,-)=0.40 
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for j = 1,2,3, where Sj is the event that a driver is stopped by the yth light. We 
assume a ‘one-light’ memory for the system. By means of the tree diagram, 
determine the probability that a driver: 

(a) Will be delayed by all four lights. 

(b) Will not be delayed by any of the four lights. 

(c) Will he delayed by at most one light. 
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Random Variables and Probability 
Distributions 


We have mentioned that our interest in the study of a random phenomenon is in the 
statements we can make concerning the events that can occur, and these statements 
are made based on probabilities assigned to simple outcomes. Basic concepts have 
been developed in Chapter 2, but a systematic and unified procedure is needed to 
facihtate making these statements, which can be quite complex. One of the immedi¬ 
ate steps that can be taken in this unifying attempt is to require that each of the 
possible outcomes of a random experiment be represented by a real number. In this 
way, when the experiment is performed, each outcome is identified by its assigned 
real number rather than by its physical description. For example, when the possible 
outcomes of a random experiment consist of success and failure, we arbitrarily assign 
the number one to the event ‘success’ and the number zero to the event ‘failure’. The 
associated sample space has now {1,0} as its sample points instead of success and 
failure, and the statement ‘the outcome is 1’ means ‘the outcome is success’. 

This procedure not only permits us to replace a sample space of arbitrary 
elements by a new sample space having only real numbers as its elements but 
also enables us to use arithmetic means for probability calculations. Further¬ 
more, most problems in science and engineering deal with quantitative meas¬ 
ures. Consequently, sample spaces associated with many random experiments 
of interest are already themselves sets of real numbers. The real-number assign¬ 
ment procedure is thus a natural unifying agent. On this basis, we may intro¬ 
duce a variable X, which is used to represent real numbers, the values of which 
are determined by the outcomes of a random experiment. This leads to the 
notion of a random variable, which is defined more precisely below. 

3.1 RANDOM VARIABLES 

Consider a random experiment to which the outcomes are elements of sample 
space S in the underlying probability space. In order to construct a model for 
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a random variable, we assume that it is possible to assign a real number X{s) 
for each outcome s following a certain set of rules. We see that the ‘number’ 
X(s) is really a real-valued point function defined over the domain of the basic 
probability space (see Definition 3.1). 

Definition 3.1. The point function T'(^) is called a random variable if (a) it is a 
finite real-valued function defined on the sample space 5 of a random experiment 
for which the probability function is defined, and (b) for every real number x, the 
set {s: X{s) < x} is an event. The relation X = X(s) takes every element 5 in 5 of 
the probability space onto a point X on the real line i?* = (—oo, oo). 

Notationally, the dependence of random variable T'(^) on 5 will be omitted 
for convenience. 

The second condition stated in Definition 3.1 is the so-called ‘measurability 
condition’. It ensures that it is meaningful to consider the probability of event 
X < X for every x, or, more generally, the probability of any finite or countably 
infinite combination of such events. 

To see more clearly the role a random variable plays in the study of a random 
phenomenon, consider again the simple example where the possible outcomes 
of a random experiment are success and failure. Let us again assign number one 
to the event success and zero to failure. If X is the random variable associated 
with this experiment, then X takes on two possible values: 1 and 0. Moreover, 
the following statements are equivalent: 

• The outcome is success. 

• The outcome is 1. 

• 1 . 

The random variable X is called a discrete random variable if it is defined 
over a sample space having a finite or a countably infinite number of sample 
points. In this case, random variable X takes on discrete values, and it is 
possible to enumerate all the values it may assume. In the case of a sample 
space having an uncountably infinite number of sample points, the associated 
random variable is called a continuous random variable, with its values dis¬ 
tributed over one or more continuous intervals on the real line. We make this 
distinction because they require different probability assignment consider¬ 
ations. Both types of random variables are important in science and engineering 
and we shall see ample evidence of this in the subsequent chapters. 

In the following, all random variables will be written in capital letters, 
W, T, Z,.... The value that a random variable X can assume will be denoted 
by corresponding lower-case letters such as x, y, z, or x\, xj ,.... 

We will have many occasions to consider a sequence of random variables 
XjJ = 1,2,...,«. In these cases we assume that they are defined on the same 
probability space. The random variables X\,X 2 , ■ ■ ■ ,Xn will then map every 
element ^ of 5 in the probability space onto a point in the n-dimensional 
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Euclidian space i?". We note here that an analysis involving n random variables 
is equivalent to considering a random vector having the n random variables as 
its components. The notion of a random vector will be used frequently in what 
follows, and we will denote them by bold capital letters X, Y, Z,.... 


3.2 PROBABILITY DISTRIBUTIONS 

The behavior of a random variable is characterized by its probability distribu¬ 
tion, that is, by the way probabilities are distributed over the values it assumes. 
A probability distribution function and a probability mass function are two 
ways to characterize this distribution for a discrete random variable. They are 
equivalent in the sense that the knowledge of either one completely specifies 
the random variable. The corresponding functions for a continuous random 
variable are the probability distribution function, defined in the same way as in 
the case of a discrete random variable, and the probability density function. 
The definitions of these functions now follow. 


3.2.1 PROBABILITY DISTRIBUTION FUNCTION 

Given a random experiment with its associated random variable X and given a 
real number x, let us consider the probability of the event A(^) < x}, or, 
simply, P{X < x). This probability is clearly dependent on the assigned value x. 
The function 


Fx{x)=P{X<x), 


(3.1) 


is defined as the probability distribution function (PDF), or simply the distribu¬ 
tion function, of X. In Equation (3.1), subscript X identifies the random vari¬ 
able. This subscript is sometimes omitted when there is no risk of confusion. 
Let us repeat that Fx{x) is simply P{A), the probability of an event A occurring, 
the event being X < x. This observation ties what we do here with the devel¬ 
opment of Chapter 2. 

The PDF is thus the probability that X will assume a value lying in a subset 
of S, the subset being point x and all points lying to the ‘left’ of x. As x 
increases, the subset covers more of the real line, and the value of PDF 
increases until it reaches 1. The PDF of a random variable thus accumulates 
probability as x increases, and the name cumulative distribution function (CDF) 
is also used for this function. 

In view of the definition and the discussion above, we give below some of the 
important properties possessed by a PDF. 
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• It exists for discrete and continuous random variables and has values between 
0 and 1. 

• It is a nonnegative, continuous-to-the-left, and nondecreasing function of the 
real variable x. Moreover, we have 

Fx{—co) = 0, and Fx(+oo) = 1 

• If a and b are two real numbers such that a < b, then 

P{a<X <b)=Fx{b)-Fx{a). 

This relation is a direct result of the identity 

P{X <b) = P{X <a) + P{a<X < b). 

We see from Equation (3.3) that the probability of X having a value in an 
arbitrary interval can be represented by the difference between two values of 
the PDF. Generalizing, probabilities associated with any sets of intervals are 
derivable from the PDF. 

Example 3.1. Fet a discrete random variable X assume values — 1,1,2, and 3, 
with probabilities g, g, and g, respectively. We then have 


fo, 

for 

X 

< - 

1; 

1 

4’ 

for 

- 

1 < 

X < 1 

1 ’ 

for 

1 

< X 

<2; 

1 

2’ 

for 

2 

< X 

<3; 

1, 

for 

X 

IV 



This function is plotted in Figure 3.1. It is typical of PDFs associated with 
discrete random variables, increasing from 0 to 1 in a ‘staircase’ fashion. 

A continuous random variable assumes a nonenumerable number of values 
over the real line. Hence, the probability of a continuous random variable 
assuming any particular value is zero and therefore no discrete jumps are 
possible for its PDF. A typical PDF for continuous random variables is 
shown in Figure 3.2. It has no jumps or discontinuities as in the case of the 
discrete random variable. The probability of X having a value in a given 
interval is found by using Equation (3.3), and it makes sense to speak only of 
this kind of probability for continuous random variables. For example, in 
Figure 3.2. 

P(-l < X < 1) = Fx{l) - Fx{-1) = 0.8 - 0.4 = 0.4. 

Clearly, P{X = a) = 0 for any a. 


(3.2) 

(3.3) 
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Figure 3.1 Probability distribution function of X, Fx{x), for Example 3.1 


Fxix) 



Figure 3.2 Probability distribution function of a continuous random variable X, Fx{x) 


3.2.2 PROBABILITY MASS FUNCTION FOR DISCRETE RANDOM 
VARIABLES 

Let Xhe a discrete random variable that assumes at most a countably infinite 
number of values xi,X 2 ,... with nonzero probabilities. If we denote 
P(X = X,) = p{xj), i = 1,2,..., then, clearly, 

0 < p{xi) <1, for all i; 
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Pxix) 



Figure 3.3 Probability mass function of X, Px{x), for the random variable defined 

in Example 3.1 


Definition 3.2. The function 


Px{x) =P{X = x). 


(3.5) 


is defined as the probability mass function (pmf) of X. Again, the subscript X is 
used to identify the associated random variable. 

For the random variable defined in Example 3.1, the pmf is zero everywhere 
except at x,, i = 1,2,..., and has the appearance shown in Figure 3.3. 

This is a typical shape of pmf for a discrete random variable. Since 
P{X = x) = 0 for any x for continuous random variables, it does not exist in 
the case of the continuous random variable. We also observe that, like Fx{x), 
the specification of Pxix) completely characterizes random variable X\ further¬ 
more, these two functions are simply related by: 


Px{xi) = Fx{xi) - Fa'(x,_i), 

i:.Xi<x 


(3.6) 

(3.7) 


(assuming xi < X 2 < ...). 

The upper limit for the sum in Equation (3.7) means that the sum is taken 
over all i satisfying x, < x. Hence, we see that the PDF and pmf of a discrete 
random variable contain the same information; each one is recoverable from 
the other. 
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One can also give PDF and pmf a useful physical interpretation. In terms of 
the distribution of one unit of mass over the real line —oo < x < oo, the PDF of 
a random variable at x, Fx{x), can be interpreted as the total mass associated 
with point x and all points lying to the left of x. The pmf, in contrast, shows 
the distribution of this unit of mass over the real line; it is distributed at discrete 
points X, with the amount of mass equal to Pxi^d at Xi,i = 1,2,.... 

Example 3.2. A discrete distribution arising in a large number of physical 
models is the binomial distribution. Much more will be said of this important 
distribution in Chapter 6 but, at present, let us use it as an illustration for 
graphing the PDF and pmf of a discrete random variable. 

A discrete random variable A has a binomial distribution when 


PxW = -P)" /c=0,1,2,...,h, (3.8) 

where n and p are two parameters of the distribution, n being a positive integer, 
and Q < p < 1. The binomial coefficient 

rn 
\k 

is defined by 

/ « \ n\ 

\k} kl{n — k)\' 

The pmf and PDF of X for n = 10 and p = 0.2 are plotted in Figure 3.4. 


Pxix) 
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Figure 3.4 (a) Probability mass function, Px(x), and (b) probability distribution 
function, Fx(x), for the discrete random variable X described in Example 3.2 
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3.2.3 PROBABILITY DENSITY FUNCTION FOR CONTINUOUS 
RANDOM VARIABLES 

For a continuous random variable X, its PDF, Fx{x), is a continuous function 
of X, and the derivative 


_ dFx{x) 

dx 


(3.10) 


exists for all x. The function/(^'(x) is called the probability density function (pdf), 
or simply the density function, of 

Since Fx{x) is monotone nondecreasing, we clearly have 

/x(-^) ^ 0 for all X. (3-11) 

Additional properties of fx{x) can be derived easily from Equation (3.10); 
these include 


Px{x)=l' fx{u)du, 
J —OO 


and 


/ /xWdx=l, 

J —OO 

P{a < X < b) = Fx{b) - Fx{a) = ( f x{x) dx. 

J a 


(3.12) 


(3.13) 


An example of pdfs has the shape shown in Figure 3.5. As indicated by 
Equations (3.13), the total area under the curve is unity and the shaded area 
from a io b gives the probability P(a < X <b). We again observe that the 
knowledge of either pdf or PDF completely characterizes a continuous random 
variable. The pdf does not exist for a discrete random variable since its 
associated PDF has discrete jumps and is not differentiable at these points of 
discontinuity. 

U sing the mass distribution analogy, the pdf of a continuous random variable 
plays exactly the same role as the pmf of a discrete random variable. The 


'Note the use of upper-case and lower-case letters, PDF and pdf, to represent the distribution and 
density functions, respectively. 
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fxM 



Figure 3.5 A probability density function,/;j-(x) 

function fx(x) can be interpreted as the mass density (mass per unit length). 
There are no masses attached to discrete points as in the discrete random 
variable case. The use of the term density function is therefore appropriate here 
for fxix). 

Example 3.3. A random variable X for which the density function has the 
form (a > 0): 


f ae for x > 0; 
(0, elsewhere; 


(3.14) 


is said to be exponentially distributed. We can easily check that all the condi¬ 
tions given by Equations (3.11)-(3.13) are satisfied. The pdf is presented 
graphically in Figure 3.6(a), and the associated PDF is shown in Figure 3.6(b). 
The functional form of the PDF as obtained from Equation (3.12) is 




a 


Fxix) 



0 



(a) (b) 

Figure 3.6 (a) Probability density function,/jj.(x), and (b) probability distribution 

function, Fx(x), for random variable X in Example 3.3 
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Fx{x) 



1 — e ' 


—oo 


f x{u)du = Q, for X < 0; 


for X > 0. 


(3.15) 


Let us compute some of the probabilities using /^(x). The probability 
P{0 < < 1) is numerically equal to the area under fx{x) from x = 0 to 

X = 1, as shown in Figure 3.6(a). It is given by 



The probability P{X > 3) is obtained by computing the area under/;i'(x) to the 
right of X = 3. Hence, 



The same probabilities can be obtained from Fx{x) by taking appropriate 
differences, giving: 


F(0 < X < 1) = Fx{l) - Fx{0) = (1 - e-“) - 0 = 1 - 
P{X > 3) = Fx{oo) - Fx{3) = 1 - (1 - 


Let us note that there is no numerical difference between /"(O < A" < 1) and 
P{0 < < 1) for continuous random variables, since P(X = 0) = 0. 


3.2.4 MIXED-TYPE DISTRIBUTION 

There are situations in which one encounters a random variable that is partially 
discrete and partially continuous. The PDF given in Figure 3.7 represents such 


Fxix) 



X 


0 


Figure 3.7 A mixed-type probability distribution function, Fx(x) 
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a case in which random variable X is continuously distributed over the real line 
except at X — 0, where P{X = 0) is a positive quantity. This situation may arise 
when, for example, random variable X represents the waiting time of a customer 
at a ticket counter. Let X be the time interval from time of arrival at the ticket 
counter to the time being served. It is reasonable to expect that X will assume 
values over the interval X > 0. At X =0, however, there is a finite probability of 
not having to wait at all, giving rise to the situation depicted in Figure 3.7. 

Strictly speaking, neither a pmf nor a pdf exists for a random variable of the 
mixed type. We can, however, still use them separately for different portions of 
the distribution, for computational purposes. Let fxix) be the pdf for the 
continuous portion of the distribution. It can be used for calculating probabil¬ 
ities in the positive range ofx values for this example. We observe that the total 
area under the pdf curve is no longer 1 but is equal to 1 — P(X = 0). 

Example 3.4. Problem: since it is more economical to limit long-distance 
telephone calls to three minutes or less, the PDF of A - the duration in minutes 
of long-distance calls - may be of the form 

( 0, for .v < 0; 

Fx{x) = < 1 “ for 0 < X < 3; 

[ 1 — for X > 3. 

Determine the probability that X is (a) more than two minutes and (b) between 
two and six minutes. 

Answer: the PDF of X is plotted in Figure 3.8, showing that X has a mixed- 
type distribution. The desired probabilities can be found from the PDF as 
before. Hence, for part (a), 

P{X > 2) = 1 - P{X < 2) = 1 - Fx{2) 

= l_(l_e-2/3)=e-2/3. 


Fxix) 



3 

Figure 3.8 Probability distribution function, Fx(x), of X, as described in Example 3.4 
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Figure 3.9 (a) Partial probability mass function, Px(x), and (b) partial probability 

density function,/;f(x), of X, as described in Example 3.4 

For part (b), 

P{2 < X < 6) = Fx{6) - Fx{2) 

= (^1 _ _ (1 _ e - 2 / 3 ) = e - 2/3 - 


Figure 3.9 shows Px(x) for the discrete portion for the continuous 

portion of X. They are given by: 

0, elsewhere; 


and 


' 0, for X < 0; 


dFx{x) 


^ e , for 0 < X < 3; 

- , for X > 3. 

6 


Note again that the area under/;f(x) is no longer one but is 

To obtain P(X > 2) and P(2 < X < 6), both the discrete and continuous 
portions come into play, and we have, for part (a), 

POO 

P{X>2) = j^ fx{x)Ax + px{l>) 

= e-2/3 
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and, for part (b), 


P{2 < X < 6) = fx{x) dxXpx^i) 

= \j i dx + ^ 

3 Ji 6 73 2e 

_ -2/3 _ ^ 

2 

These results are, of course, the same as those obtained earlier using the PDF. 


3.3 TWO OR MORE RANDOM VARIABLES 

In many cases it is more natural to describe the outcome of a random experi¬ 
ment by two or more numerical numbers simultaneously. For example, the 
characterization of both weight and height in a given population, the study of 
temperature and pressure variations in a physical experiment, and the distribu¬ 
tion of monthly temperature readings in a given region over a given year. In 
these situations, two or more random variables are considered jointly and the 
description of their joint behavior is our concern. 

Let us first consider the case of two random variables X and F. We proceed 
analogously to the single random variable case in defining their joint prob¬ 
ability distributions. We note that random variables X and Y can also be 
considered as components of a two-dimensional random vector, say Z. Joint 
probability distributions associated with two random variables are sometimes 
called bivariate distributions. 

As we shall see, extensions to cases involving more than two random vari¬ 
ables, or multivariate distributions, axe straightforward. 


3.3.1 JOINT PROBABILITY DISTRIBUTION FUNCTION 

The joint probability distribution function (JPDF) of random variables X and Y, 
denoted by Fxy {x,y), is defined by 


FxY{x,y) = P{X <xOY <y), 


(3.16) 


for all X and y. It is the probability of the intersection of two events; random 
variables X and Y thus induce a probability distribution over a two-dimensional 
Euclidean plane. 
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U sing again the mass distribution analogy, let one unit of mass be distributed 
over the (x,y) plane in such a way that the mass in any given region R is equal 
to the probability that X and Y take values in R. Then JPDF Fxr(x,y) 
represents the total mass in the quadrant to the left and below the point 
(x,y), inclusive of the boundaries. In the case where both X and Y are discrete, 
all the mass is concentrated at a finite or countably infinite number of points in 
the (x,y) plane as point masses. When both are continuous, the mass is 
distributed continuously over the {x,y) plane. 

It is clear from the definition that Fxy ix,y) is nonnegative, nondecreasing in 
X and y, and continuous to the left with respect to x and y. The following 
properties are also a direct consequence of the definition: 

Fxy{-oo, -oo) = FxY{-oo,y) = Fxy{x, -oo) = 0, 
F'x7(+oo,+oo) = 1, 

Fxy{x,+oo) = Fx{x), 

FxY{+oo,y) = FY(y). 


For example, the third relation above follows from the fact that the joint event 
X < X n Y < +00 is the same as the event X < x, since Y < +oo is a sure event. 
Flence, 


Fxy{x, +oo) = P{X < X n T < +oo) = P{X < x) = Fx{x). 

Similarly, we can show that, for any xi,X2,yi, and y2 such that x\ < X2 and 
y\ < y 2 , the probability P{x\ <W<X 2 nji < Y < 72 ) is given in terms of 
FxY(x,y) by 

P(xi < X < X2 n yi < Y < y2) = FxY{x2,y2) - FxY{xuy2) 

— FxY{x2,yi) + FxY(.X\,y\), 

which shows that all probability calculations involving random variables X and 
Y can be made with the knowledge of their JPDF. 

Finally, we note that the last two equations in Equations (3.17) show that 
distribution functions of individual random variables are directly derivable 
from their joint distribution function. The converse, of course, is not true. In 
the context of several random variables, these individual distribution functions 
are called marginal distribution functions. For example, Fxix) is the marginal 
distribution function of X. 

The general shape of FxY{x,y) can be visualized from the properties given in 
Equations (3.17). In the case where X and Y are discrete, it has the appearance of 
a corner of an irregular staircase, something like that shown in Figure 3.10. It rises 
from zero to the height of one in the direction moving from the third quadrant to the 
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Fxvixy) 



Figure 3.10 A joint probability distribution function of X and y,FxY {x,y), when X and 

Y are discrete 

first quadrant. When both X and Y are continuous, FxY{x,y) becomes a smooth 
surface with the same features. It is a staircase type in one direction and smooth in 
the other if one of the random variables is discrete and the other continuous. 

The joint probability distribution function of more than two random vari¬ 
ables is defined in a similar fashion. Consider n random variables 
Xi,X 2 , ■ ■ ■ ,X„. Their JPDF is defined by 


FxiX2...x„{^uX2 ,... ,x„) = P(JTi < xi n W2 < X2 n ... n < x„). 


(3.19) 


These random variables induce a probability distribution in an n-dimensional 
Euclidean space. One can deduce immediately its properties in parallel to those 
noted in Equations (3.17) and (3.18) for the two-random-variable case. 

As we have mentioned previously, a finite number of random variables 
XjJ = 1,2,.. .n, may be regarded as the components of an n-dimensional 
random vector X. The JPDF of X is identical to that given above but it can 
be written in a more compact form, namely, Fx(x), where x is the vector, with 
components xi,X 2 , ■ ■ ■ ,x„. 

3.3.2 JOINT PROBABILITY MASS FUNCTION 

The joint probability mass function (jpmf) is another, and more direct, charac¬ 
terization of the joint behavior of two or more random variables when they are 
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discrete. Let X and Y be two discrete random variables that assume at most 
a countably infinite number of value pairs (x,,y^), ij = 1, 2 ,..with nonzero 
probahilities. The jpmf of X and Y is defined by 


PxY{x,y) =P{X = xr\ Y = y), 


(3.20) 


for all X and y. It is zero everywhere except at points {xi,yj),i,j = 1,2,..., 
where it takes values equal to the joint probability P{X = x, H F = yj)- We 
observe the following properties, which are direct extensions of those noted in 
Equations (3.4), (3.6), and (3.7) for the single-random-variahle case: 

0 <PxYi^hyj) < 1 , 

YlYlP^Yixi,yj) = 1 , 

i J 

Y^PxYixi,y) =PY{y), ' 

i 

Y^Pxxix.yj) =Px{x), 

./■ 

where Pxix) and Pyiy) are now called marginal probability mass functions. We 
also have 


i:Xi<x j.yi<y 

FxY{x,y)=Y^ PxYixi,yj). 

i— 1 j— 1 


(3.22) 


Example 3.5. Problem: consider a simplified version of a two-dimensional 
‘random walk’ problem. We imagine a particle that moves in a plane in unit 
steps starting from the origin. Each step is one unit in the positive direction, with 
prohahility p along thex axis and probability q {p + q = Y) along the y axis. We 
further assume that each step is taken independently of the others. What is the 
prohahility distribution of the position of this particle after five steps? 

Answer: since the position is conveniently represented by two coordinates, 
we wish to establish pxYix,y) where random variable X represents the x 
coordinate of the position after five steps and where Y represents the y coord¬ 
inate. It is clear that jpmf pxY(x,y) is zero everywhere except at those points 
satisfying x-|-y = 5 and x,y > 0. Invoking the independence of events of 
taking successive steps, it follows from Section 3.3 that p;fy(5,0), the probabil¬ 
ity of the particle being at (5,0) after five steps, is the product of probabilities of 
taking five successive steps in the positive x direction. Hence 
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/^A'y(5,0) =p\ 


For /txy(4,1), there are five distinct ways of reaching that position (4 steps in 
the X direction and 1 in y; 3 in the x direction, 1 in y, and 1 in the x direction; 
and so on), each with a probability of p'^q. We thus have 

FA'y(4,1) = 5p^q. 


Similarly, other nonvanishing values of pxYix,y) are easily calculated to be 


PxY{x,y) 


lOp^q^, for (x,y) = (3,2); 
I0p^q\ for (x,y) = (2,3); 
5pq\ for (x,j) = (1,4); 
q\ for (x,>>) = (0,5). 


The jpmf pxY(x,y) is graphically presented in Figure 3.11 for p — 0.4 and 
q = 0.6. It is easy to check that the sum of pxYix,y) over all x and y is 1, as 
required by the second of Equations (3.21). 

Let us note that the marginal probability mass functions of X and Y are, 
following the last two expressions in Equations (3.21), 


Px{^) = ^Pxxi^^yj) 
,/ 


' q^, for .T = 0; 
5pq^, for X = 1; 
IQp^q^, for X = 2; 

IQp^q^, for X = 3; 

5p‘^q, for X = 4; 

, for X = 5; 



Figure 3.11 The joint probability mass function, pxyixy), for Example 3.5, with 

p = 0.4 and q = 0.6 
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and 


PY{y) = 


for y = 

0 ; 

5p^q, 

for y 

= 1; 

lOp^q^, 

for 

y = 

lOp^q^, 

for 

y = 

5pq\ 

for y 

= 4; 

q^, for y = 

5. 


These are marginal pmfs of X and Y. 

The joint probability distribution function FxYix,y) can also be constructed, 
by using Equation (3.22). Rather than showing it in three-dimensional form, 
Figure 3.12 gives this function by indicating its value in each of the dividing 
regions. One should also note that the arrays of indicated numbers beyond 
y = 5 are values associated with the marginal distribution function Fxix). 
Similarly, Fyiy) takes those values situated beyond x = 5. These observations 
are also indicated on the graph. 

The knowledge of the joint probability mass function permits us to make all 
probability calculations of interest. The probability of any event being realized 
involving X and Y is found by determining the pairs of values of X and Y that 
give rise to this event and then simply summing over the values of pxY(x,y) for 
all such pairs. In Example 3.5, suppose we wish to determine the probability of 
X > y; it is given by 


Fxvix.y) 



Figure 3.12 The joint probability distribution function, FxY(x,y), for Example 3.5, 

with p = 0.4 and q = 0.6 
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Table 3.1 Joint probability mass function for low, medium, and high precipitation 
levels (x = I, 2, and 3, respectively) and critical and noncritical peak flow rates (y = 1 
and 2, respectively), for Example 3.6 


y 





1 

2 

3 

1 

0.0 

0.06 

0.12 

2 

0.5 

0.24 

0.08 


F(X > Y)= P{X = 5 n 7 = 0) + P{X = 4 n F = 1) + P{X = 3 n F = 2) 

= 0.01024 + 0.0768 + 0.2304 = 0.31744. 

Example 3.6. Let us discuss again Example 2.11 in the context of random 
variables. Let X be the random variable representing precipitation levels, with 
values 1, 2, and 3 indicating low, medium, and high, respectively. The random 
variable F will be used for the peak flow rate, with the value 1 when it is critical 
and 2 when noncritical. The information given in Example 2.11 defines jpmf 
PxY(x,y), the values of which are tabulated in Table 3.1. 

In order to determine the probability of reaching the critical level of peak 
flow rate, for example, we simply sum over all pxY{x,y) satisfying y = 1, 
regardless of x values. Hence, we have 

P(F = 1) = Pxy{1, 1) + Pxy{2, 1) + Pxy{2, 1) = 0.0 + 0.06 + 0.12 = 0.18. 

The definition of jpmf for more than two random variables is a direct extension 
of that for the two-random-variable case. Consider n random variables 
Xi,X 2 , ■ ■ ■ ,Xn. Their jpmf is defined by 


PxtX 2 ...xMuX 2 , ... ,x„) = P{Xx = xi 0X2 = X2 n ... n = x„), 


(3.23) 


which is the probability of the intersection of n events. Its properties and 
utilities follow directly from our discussion in the two-random-variable case. 
Again, a more compact form for the jpmf is pxix) where A is an n-dimensional 
random vector with components Ai, A 2 ,..., A„. 


3.3.3 JOINT PROBABILITY DENSITY FUNCTION 

As in the case of single random variables, probability density functions become 
appropriate when the random variables are continuous. The joint probability 
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density function (jpdf) of two random variables, X and Y, is defined by the 
partial derivative 


fxY{x,y) 


d^FxY{x,y) 

0X0V 


(3.24) 


Since FxYix,y) is monotone nondecreasing in both x and y,fxY^x,y) is 
nonnegative for all x and y. We also see from Equation (3.24) that 


FxY{x,y) = P{X <xf\Y <y) = f j f xy{u,v)AuAv. 

J —oo J —oo 


(3.25) 


Moreover, with x\ < X 2 , and yi < y 2 , 


P{x\ < X < X 2 nyi < Y < F 2 ) = / / f xY{x,y)^xdy. (3.26) 


The jpdfdefines a surface over the {x,y) plane. As indicated by 
Equation (3.26), the probability that random variables X and Y fall within a 
certain region R is equal to the volume under the surface of f xYix,y) and 
bounded by that region. This is illustrated in Eigure 3.13. 
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We also note the following important properties: 

/ OO f-OO 

/ 1, (3.27) 

•OO J —OO 

/ OO 

/zy('^.>’)d>’=/zW, (3-28) 

•OO 

/ OO 

f riy)- (3-29) 

•OO 


Equation (3.27) follows from Equation (3.25) by letting x,y +oo,+oo, and 
this shows that the total volume under the/;(.y(x,y) surface is unity. To give 
a derivation of Equation (3.28), we know that 


Fx{x) = Fxy{x,+oo) = 


fxY{u,y)duAy. 


' —OO J —OO 


Differentiating the above with respect to x gives the desired result immediately. 
The density functions/_y(x) and/y(y) in Equations (3.28) and (3.29) are now 
called the marginal density functions of X and Y, respectively. 

Example 3.7. Problem: a boy and a girl plan to meet at a certain place between 
9 a.m. and 10 a.m., each not waiting more than 10 minutes for the other. If all 
times of arrival within the hour are equally likely for each person, and if their 
times of arrival are independent, find the probability that they will meet. 

Answer: for a single continuous random variable A that takes all values over 
an interval a to b with equal likelihood, the distribution is called a uniform 
distribution and its density function/;(.(x) has the form 

f ! \ ) —! for a < X < b; ia\ 

fx{x) = <b-a’ (3.30) 

[ 0, elsewhere. 


The height of/j^-(x) over the interval (a, b) must be l/(b — a) in order that the 
area is 1 below the curve (see Figure 3.14). For a two-dimensional case as 
described in this example, the joint density function of two independent uni¬ 
formly distributed random variables is a flat surface within prescribed bounds. 
The volume under the surface is unity. 

Let the boy arrive at X minutes past 9 a.m. and the girl arrive at Y minutes past 
9 a.m. The jpdf fxYi^^y) thus takes the form shown in Figure 3.15 and is given by 


fxY(.x,y) 


——, for 0 < X < 60, and 0 < y < 60; 
3600 

0 , elsewhere. 
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1 



b-a 

1 

1 

1 

1 

1 

1 

1 

1 

1 


Figure 3.14 A uniform density function, 



Figure 3.15 The joint probability density function/;fj.(x,y), for Example 3.7 

The probability we are seeking is thus the volume under this surface over an 
appropriate region 7?. For this problem, the region R is given by 

i? : lA'- F| < 10 

and is shown in Figure 3.16 in the (x,y) plane. 

The volume of interest can be found by inspection in this simple case. 
Dividing R into three regions as shown, we have 

P(they will meet) = P{\X — Y \ < 10) 

= [2(5)(10) + 10V2(50V2)]/3600 = ^ 
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y 


60 



10 


X 


0 


10 


60 


Figure 3.16 Region R in Example 3.7 


Note that, for a more complicated jpdf, one needs to carry out the volume 
integral j)dxdj for volume calculations. 

As an exercise, let us determine the joint probability distribution function 
and the marginal density functions of random variables X and Y defined in 
Example 3.7. 

The JPDF of X and Y is obtained from Equation (3.25). It is clear that 



0, for (x,j) < (0,0); 

1, for {x,y) > (60,60). 


Within the region (0,0) < (x, y) < (60, 60), we have 



For marginal density functions. Equations (3.28) and (3.29) give us 
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Similarly, 


for 0 < F < 60; 
elsewhere. 

Both random variables are thus uniformly distributed over the interval (0, 60). 

Example 3.8. In structural reliability studies, the resistance F of a structural 
element and the force X applied to it are generally regarded as random vari¬ 
ables. The probability of failure, pf, is defined by P(Y <X). Suppose that the 
jpdf of X and Y is specified to be 

for {x,y) > 0; 

0, for (x,_v) < 0; 

where a and b are known positive constants, we wish to determine pf. 

The probability pf is determined from 




Pf = jJ fxY{x,y)dxdy, 


where R is the region satisfying Y < X. Since X and Y take only positive values, 
the region R is that shown in Figure 3.17. Hence, 


Pf = 


n oo 

ahe'(“+'’>')dxdj 


b 

a + b 



Figure 3.17 Region R in Example 3.8 
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In closing this section, let us note that generalization to the case of many 
random variables is again straightforward. The joint distribution function of n 
random variables X\,X 2 ,... ,X„, or Z, is given, by Equation (3.19), as 


Fx{x) = P{Xi < xi n Z 2 < X2... n Z„ < x„). (3.31) 


The corresponding joint density function, denoted hyfxix), is then 


fx{x) 


Q'^Fxjx) 
0X10X2 . . . 0X„ ’ 


(3.32) 


if the indicated partial derivatives exist. Various properties possessed by these 
functions can be readily inferred from those indicated for the two-random- 
variable case. 


3.4 CONDITIONAL DISTRIBUTION AND INDEPENDENCE 

The important concepts of conditional probability and independence intro¬ 
duced in Sections 2.2 and 2.4 play equally important roles in the context of 
random variables. The conditional distribution function of a random variable Z, 
given that another random variable Y has taken a value y, is defined by 


FxY{x\y) = P{X < x| F = j). 


(3.33) 


Similarly, when random variable Z is discrete, the definition of conditional mass 
function of Z given F = y is 


Pxrixly) = P{X = x\Y = y). 


(3.34) 


Using the definition of conditional probability given by Equation (2.24), 
we have 


or 


PxY{x\y) = P{X = x| F = y) = 


P{X = X n F = y) 

P{Y = y) 


PxY{x\y) — 0 , 


(3.35) 
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which is expected. It gives the relationship between the joint jpmf and the 
conditional mass function. As we will see in Example 3.9, it is sometimes more 
convenient to derive joint mass functions by using Equation (3.35), as condi¬ 
tional mass functions are more readily available. 

If random variables X and Y are independent, then the definition of inde¬ 
pendence, Equation (2.16), implies 

PxriMy) =Pxi^)^ (3-36) 

and Equation (3.35) becomes 

Pxri^^y) =Pxix)PYiy)- (3-37) 

Thus, when, and only when, random variables X and Y are independent, their 
jpmf is the product of the marginal mass functions. 

Let A be a continuous random variable. A consistent definition of the 
conditional density function of X given Y = y,f xxiAy)^ •^he derivative of 
its corresponding conditional distribution function. Hence, 


f xxiAy) 


AFxY{x\y) 

dx 


(3.38) 


where FxY(x\y) is defined in Equation (3.33). To see what this definition leads 
to, let us consider 


P(xi < A < X2|>’1 < T < P2) 


P(xi < A < X2 n ji < T < F2) 

P{yi < y <y2) 


(3.39) 


In terms of jpdf/_yj.(x,y), it is given by 

ryi rx2 j ryi poo 

P(xi < A<X2 |fi < T<P2)= / / fxY{x,y)dxdy/ / fxY{x,y)dxdy 

Jy\ J x\ / Jy\ J ~cx> 

ryi fX2 j ryi 

= / f xY{x,y)dxdy / /r(j)dj. (3.40) 

Jy[ Jx\ / t/vi 

By setting x\ = — 00 , X 2 = x, ji = y, and F 2 = T + ^y, and by taking the limit 
Ay 0, Equation (3.40) reduces to 


FxY{x\y) = - 


fxY{u,y) dw 


(3.41) 


provided that/y(y) 0. 
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Now we see that Equation (3.38) leads to 


fxviAy) 


dFxY{x\y) _fxY{x,y) ^ ^ 


(3.42) 


which is in a form identical to that of Equation (3.35) for the mass functions - a 
satisfying result. We should add here that this relationship between the condi¬ 
tional density function and the joint density function is obtained at the expense 
of Equation (3.33) for FxY{x\y)- We say ‘at the expense of because the defin¬ 
ition given to Exy(x|y) does not lead to a convenient relationship between 
FxYix\y) and Fxxix^y), that is, 


( 3 , 43 ) 

This inconvenience, however, is not a severe penalty as we deal with density 
functions and mass functions more often. 

When random variables X and Y are independent, FxY(x\y) = Fxix) and, as 
seen from Equation (3.42), 


J xY{x\y) —/x(x), (3.44) 

and 

f xY{x,y) = f x{x)j Y{y)j (3-45) 

which shows again that the joint density function is equal to the product of the 
associated marginal density functions when X and Y are independent. 

Einally, let us note that, when random variables X and Y are discrete. 


i ; Xi<X 

FxY{x\y) = ^ PxY{xi\y), 
/=1 


and, in the case of a continuous random variable, 


FxY{x\y) = f fxY{u\y)du. 
J —00 


(3.46) 


(3.47) 


Comparison of these equations with Equations (3.7) and (3.12) reveals they are 
identical to those relating these functions for X alone. 

Extensions of the above results to the case of more than two random vari¬ 
ables are again straightforward. Starting from 

P{ABC) = P{A\BC)P{B\C)P{C) 
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[see Equation (2.26)], for three events A,B, and C, we have, in the case of three 
random variables X, Y, and Z, 


PxYz{x, y, z) = PxYz{x\y, z)pYz{y\z)Pz{z) 
J XYzi^iYi^) = f XYziAy 1 ^)f 7z(>’k)/z(^) 


(3.48) 


Hence, for the general case of n random variables, Xi,X 2 ,... ,2f„, or X, we can 
write 


PxM =Px,X 2 ...xMi 1 -^ 2 , • ■ .,x„)px,,„xjx2lx3,. ..,x„).. ■Px„_,x„(x„-ilx„)pxjx„); 1 

/x(x) =/x,X2...X„(^i 1^2, ■ ■ ■,X„)/x^ x„(X 2 lX 2 ,. ..,X„).. ■fx„^,x„(2C„-llx„)/xJx„). j 

(3.49) 

In the event that these random variables are mutually independent. Equations 
(3.49) become 


PxM =Px,(xi)PX 2(2^2) ■■■Px„(2C„); 
fx{x) =/zi( 2 Ci)/z 2 ( 2 C 2 ) • • -fxMn)- 

Example 3.9. To show that joint mass functions are sometimes more easily 
found by finding first the conditional mass functions, let us consider a traffic 
problem as described below. 

Problem: a group of n cars enters an intersection from the south. Through 
prior observations, it is estimated that each car has the probability p of turning 
east, probability q of turning west, and probability r of going straight on 
{p -\- q r = 1). Assume that drivers behave independently and let X be the 
number of cars turning east and Y the number turning west. Determine the 
]pmipxYix,y). 

Answer: since 


(3.50) 


PxY{ 2 C,y) = PxY{Ay)PY{y)^ 

we proceed by determining PxYi2c\y) and PY(y)- The marginal mass function 
PY(y) is found in a way very similar to that in the random walk situation 
described in Example 3.5. Each car has two alternatives: turning west, and 
not turning west. By enumeration, we can show that it has a binomial distribu¬ 
tion (to be more fully justified in Chapter 6) 

Py(T) = (”)^^'(1>’=1,2,.... (3.51) 
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Consider now the conditional mass function p^Yi^\y)- With Y = y having 
happened, the situation is again similar to that for determining pyiy) except 
that the number of cars available for taking possible eastward turns is now 
n — y; also, here, the probabilities p and r need to be renormalized so that they 
sum to 1. Hence, PxYi^\y) takes the form 



X = 0,1,...— j, j = 0,1,... 

(3.52) 


Finally, we have pxY(x,y) as the product of the two expressions given by 
Equations (3.51) and (3.52). The ranges of values for x and y are x = 0,1,..., 
n — y, and y = 0, 1,..., n. 

Note that pxY(x,y) has a rather complicated expression that could not have 
been derived easily in a direct way. This also points out the need to exercise care 
in determining the limits of validity for x and y. 

Example 3.10. Problem: resistors are designed to have a resistance R of 
50 ± 2 H. Owing to imprecision in the manufacturing process, the actual density 
function of R has the form shown by the solid curve in Figure 3.18. Determine 
the density function of R after screening - that is, after all the resistors having 
resistances beyond the 48-52 0 range are rejected. 

Answer: we are interested in the conditional density function,/^(r|A), where 
A is the event {48 < R < 52}. This is not the usual conditional density function 
but it can be found from the basic definition of conditional probability. 

We start by considering 


FR{r\A) = P{R < r|48 < i? < 52) 


P{R < r n 48 < i? < 52) 
/’(48 <R<52) 


fR 



Figure 3.18 The actual,/^(r), and conditional,/^(r|A), for Example 3.10 
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However, 


, for r < 48; 

i? < r n 48 < i? < 52 = <( 48 < i? < r, for 48 < r < 52; 

48 < i? < 52, for r > 52. 


Hence, 


' 0, for r < 48; 
FR{r\A) = P(48 <R<r) 


I4S 


fRir)dr 


P(48 < P < 52) 
I 1, for r > 52; 


for 48 < r < 52; 


where 


f52 

c = / .fR{r)dr. 

JAi 


is a constant. 

The desired/g(r|A) is then obtained from the above by differentiation. We 
obtain 




dFR{r\A) 

dr 


'Mr) 


for 48 < r < 52 


0, elsewhere 


It can be seen from Figure 3.18 (dashed line) that the effect of screening is 
essentially a truncation of the tails of the distribution beyond the allowable 
limits. This is accompanied by an adjustment within the limits by a multi¬ 
plicative factor 1/c so that the area under the curve is again equal to 1. 


FURTHER READING AND COMMENTS 

We discussed in Section 3.3 the determination of (unique) marginal distributions from a 
knowledge of joint distributions. It should be noted here that the knowledge of marginal 
distributions does not in general lead to a unique joint distribution. The following reference 
shows that all joint distributions having a specified set of marginals can be obtained by 
repeated applications of the so-called 9 transformation to the product of the marginals: 

Becker, P.W., 1970, “A Note on Joint Densities which have the Same Set of Marginal 
Densities”, in Proc. International Symp. Information Theory, Elsevier Scientific Pub¬ 
lishers, The Netherlands. 
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PROBLEMS 

3.1 For each of the functions given below, determine constant a so that it possesses all 
the properties of a probability distribution function (PDF). Determine, in each case, 
its associated probability density function (pdf) or probability mass function (pmf) 
if it exists and sketch all functions. 

(a) Case 1: 


(b) Case 2: 


F{x) 


0, for X < 5; 
a, for X > 5. 


{ 0, for X < 5; 

for 5 < X < 7; 
a, for X > 7. 


(c) Case 3: 


F{x) 


0, for X < 1; 

k 

\/a', for k < X < k + I, and k = 1,2, 3,_ 

/=i 


(d) Cased: 


(e) Case 5: 


(f) Case 6: 


F{x) 


0, for X < 0; 

1 - e-“, for X > 0. 


F{x) 


0, for X < 0; 
x^, for 0 < X < 1; 
1, for X > 1. 


( 0, for X < 0; 

F{x} = < asin^' ^/x, for 0 < x < 1; 
1 1, for X > 0. 


(g) Case 7: 


F(x) 


0, for X < 0; 

fl( 1 — + ^, for X > 0. 


3.2 For each part of Problem 3.1, determine: 

(a) P(X < 6); 

(b) P(\< X <7). 
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m 




Figure 3.19 The probability mass function, Py(x), and probability density function, 

fx(x), for Problem 3.3 

3.3 For Px(x) and/;c(x) in Figure 3.19(a) and 3.19(b) respectively, sketch roughly in 
scale the corresponding PDF Fx(x) and show on all graphs the procedure for 
finding P(2< X < 4). 

3.4 For each part, find the corresponding PDF for random variable X. 

(a) Case 1: 


f 0.1, for 90 < X < 100; 
10, elsewhere. 


(b) Case 2: 


f 2(1 — x), for 0 < X < 1; 
10, elsewhere. 


(c) Case 3: 


fx{x) 


—r, -^, for —00 < X < 00 . 

7r(l +x2) 


3.5 The pdf of X is shown in Figure 3.20. 

(a) Determine the value of a. 

(b) Graph Fx(x) approximately. 

(c) Determine P(X > 2). 

(d) Determine P(X > 2\X > 1). 

3.6 The life X, in hours, of a certain kind of electronic component has a pdf given by 


fxix) 


0, for X < 100; 


100 


for X > 100. 


Determine the probability that a component will survive 150 hours of operation. 
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3.7 Let T denote the life (in months) of a light bulb and let 


.f tU) 


1 t 

15 ^ 4 ^’ 


for 0 < t < 30; 


0, elsewhere. 


(a) Plot/j-ft) against t. 

(b) Derive Frit) and plot Frit) against t. 

(c) Determine using/j.(0, the probability that the light bulb will last at least 15 
months. 

(d) Determine, using Fj{t), the probability that the light hulb will last at least 15 
months. 

(e) A light bulh has already lasted 15 months. What is the probability that it will 
survive another month? 

3.8 The time, in minutes, required for a student to travel from home to a morning 
class is uniformly distributed between 20 and 25. If the student leaves home 
promptly at 7:38 a.m., what is the probability that the student will not be late for 
class at 8:00 a.m.? 

3.9 In constructing the bridge shown in Figure 3.21, an engineer is concerned with 
forces acting on the end supports caused by a randomly applied concentrated load 
P, the term ‘randomly applied’ meaning that the probability of the load lying in any 
region is proportional only to the length of that region. Suppose that the bridge has 
a span 2b. Determine the PDF and pdf of random variable X, which is the distance 
from the load to the nearest edge support. Sketch these functions. 


P 

A--A 

L - 2b - J 


Figure 3.21 Diagram of the bridge, for Problem 3.9 
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Fire station 


A 



B 


b 


Figure 3.22 Position of the fire station and stretch of forest, AB, for Problem 3.10 


3.10 Fire can erupt at random at any point along a stretch of forest AB. The fire 
station is located as shown in Figure 3.22. Determine the PDF and pdf of 
X, representing the distance between the fire and the fire station. Sketch these 
functions. 

3.11 Pollutant concentrations caused by a pollution source can be modeled by the pdf 
(a > 0): 



where R is the distance from the source. Determine the radius within which 95% of 
the pollutant is contained. 

3.12 As an example of a mixed probability distribution, consider the following problem: 
a particle is at rest at the origin (x = 0) at time t = 0. At a randomly selected time 
uniformly distributed over the interval 0 < t < 1, the particle is suddenly given a 
velocity v in the positive x direction. 

(a) Show that A, the particle position at f(0 < t< 1), has the PDF shown in Figure 


3.23. 


(b) Calculate the probability that the particle is at least v/3 away from the origin at 


t= 1/2. 


Fx(x) 



FxM = 1-f+F 




X 


Figure 3.23 The probability distribution function, Fx{x), for Problem 3.12 
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3.13 For each of the joint probability mass functions (jpmf), pxYix,y), or Joint prob¬ 
ability density functions (jpdf),/jj.y(x,y), given below (cases 1-4), determine: 

(a) the marginal mass or density functions, 

(b) whether the random variables are independent. 

(i) Case 1 


PxYix,y) = 


(ii) Case 2: 

fxYix,y) = 

(iii) Case 3 

fxY{x,y) = 

(iv) Case 4 


0.5, 

for 

(x,y) = 

( 1 , 1 ); 

0 . 1 , 

for 

(x,y) = 

( 1 , 2 ); 

0 . 1 , 

for 

(x,y) = 

( 2 , 1 ); 

0.3, 

for 

(x,y) = 

( 2 , 2 ). 

for 0 < 

X < 1 , and 1 ■ 


0 , elsewhere. 


e ^^+y\ for (x,y) > ( 0 , 0 ); 
0 , elsewhere. 


, X _ / 4y(x — y)e for 0 < X < 00 , and 0 < y < x; 

/a'J'(VP) - elsewhere. 


3.14 Supposed and Y have jpmf 


PxY(x,y) = 


0 . 1 , 

for 

(x,y) = 

( 1 , 1 ); 

0 . 2 , 

for 

(x,y) = 

( 1 , 2 ); 

0.3, 

for 

(x,y) = 

( 2 , 1 ); 

0.4, 

for 

(x,y) = 

( 2 , 2 ). 


(a) Determine marginal pmfs of X and Y. 

(b) Determine P(X = 1). 

(c) Determine P(2V < Y). 

3.15 Let Xi, X 2 , and V 3 be independent random variables, each taking values ±1 with 
probabilities 1/2. Define random variables Kj, Y 2 , and F 3 by 


Yl=XlX2, Y2 = XiX2, Y3=X2X3 


Show that any two of these new random variables are independent but that Yi, Y 2 , 
and F 3 are not independent. 

3.16 The random variables X and Y are distributed according to the jpdf given hy 
Case 2, in Problem 3.13(ii). Determine: 

(a) P(3r> 0.5 n Y> 1.0). 

(b) P(XY < i). 
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(c) P(jr < 0.5|F= 1.5). 

(d) P(jr<0.5iF< 1.5). 

3.17 Let random variable X denote the time of failure in years of a system for which the 
PDF is Fx(x). In terms of Fx(x), determine the probability 

P{X < xjX > 100), 


which is the conditional distribution function of X given that the system did not fail 
up to 100 years. 

3.18 The pdf of random variable X is 


f 3x^, for —1 < X < 0; 
i 0, elsewhere. 


Determine F(X > b\X < b/2) with — 1 < b < 0. 

3.19 Using the joint probability distribution given in Example 3.5 for random variables 
X and Y, determine: 

(a) P(X > 3). 

(b) P(0 < F < 3). 

(c) P(X > 3|F < 2). 

3.20 Let 


( ke , for 0 < X < 1, and 0 < y < 2; 
i 0, elsewhere. 


(a) What must be the value of fc? 

(b) Determine the marginal pdfs of X and Y. 

(c) Are X and Y statistically independent? Why? 

3.21 A commuter is accustomed to leaving home between 7:30 a.m and 8:00 a.m., the drive 
to the station taking between 20 and 30 minutes. It is assumed that departure time and 
travel time for the trip are independent random variables, uniformly distributed over 
their respective intervals. There are two trains the commuter can take; the first leaves 
at 8:05 a.m. and takes 30 minutes for the trip, and the second leaves at 8:25 a.m. and 
takes 35 minutes. What is the probability that the commuter misses both trains? 

3.22 The distance X (in miles) from a nuclear plant to the epicenter of potential earth¬ 
quakes within 50 miles is distributed according to 


fx{x) 


2x 

2 ^’ 


for 0 < X < 50; 


0 , elsewhere; 


and the magnitude Y of potential earthquakes 
according to 


of scales 5 to 


9 is distributed 


friy) 


M9-yf 

64 

0 , elsewhere. 


for 5 < 


T<9; 
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Assume that X and Y are independent. Determine P{X < 25 fl K > 8 ), the proh- 
ahility that the next earthquake within 50 miles will have a magnitude greater than 
8 and that its epicenter will lie within 25 miles of the nuclear plant. 

3.23 Let random variables X and Y he independent and uniformly distributed in the 
square (0,0) < {X,Y)< (1, 1). Determine the probability that XY < 1/2. 

3.24 In splashdown maneuvers, spacecrafts often miss the target because of guidance 
inaccuracies, atmospheric disturbances, and other error sources. Taking the origin 
of the coordinates as the designed point of impact, the V and Y coordinates of the 
actual impact point are random, with marginal density functions 

-00 < x < oo; 

(jfZTr) ' 

friy) = -00 < y < 00. 

a( 27 r) 

that the random variables are independent. Show that the probability 
of a splashdown lying within a circle of radius a centered at the origin 
is 1 

3.25 Let Xi,X 2 , ■ ■ ■ ,X„ be independent and identically distributed random variables, 
each with PDF Fx(x). Show that 

P[min(jri, A 2 , ...,X„)<u] = l-[\- Fxiu)]", 

P[max(Vi ,X 2 ,...,X„)<u] = [Fx{u)]". 

The above are examples of extreme-value distributions. They are of considerable 
practical importance and will be discussed in Section 7.6. 

3.26 In studies of social mobility, assume that social classes can be ordered from 1 
(professional) to 7 (unskilled). Let random variable V*. denote the class order of the 
A:th generation. Then, for a given region, the following information is given: 

(i) The pmf of Aq is described by (1) = 0.00, Px (2) = 0.00, Px (3) = 0.04, 
PxS^) = 0.06, pxP) = 0.11, PxS(>) = 0.28, andp;,„(7) = 0.51. 

(ii) The conditional probabilities P(T*,+ | = i\Xk = j) for i, j = 1,2,..., 7 and for 
every k are given in Table 3.2. 


Table 3.2 P{Xk+\ = i\Xk = ,/) for Problem 3.26 


i 




j 




1 

2 

3 

4 

5 

6 

7 

1 

0.388 

0.107 

0.035 

0.021 

0.009 

0.000 

0.000 

2 

0.146 

0.267 

0.101 

0.039 

0.024 

0.013 

0.008 

3 

0.202 

0.227 

0.188 

0.112 

0.075 

0.041 

0.036 

4 

0.062 

0.120 

0.191 

0.212 

0.123 

0.088 

0.083 

5 

0.140 

0.206 

0.357 

0.430 

0.473 

0.391 

0.364 

6 

0.047 

0.053 

0.067 

0.124 

0.171 

0.312 

0.235 

7 

0.015 

0.020 

0.061 

0.062 

0.125 

0.155 

0.274 
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(iii) The outcome at the {k + l)th generation is dependent only on the class order 
at the kth generation and not on any generation prior to it; that is, 

P{Xk+\ = i\Xk =,/ n Xk-\ =mr\...) = P{Xk+\ = i\Xk =j) 

Determine 

(a) The pmf 01 X 3 . 

(b) The jpmf of Xs and A' 4 . 
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While a probability distribution [Fx{x),Px(x), or fxix)] contains a complete 
description of a random variable W, it is often of interest to seek a set of simple 
numbers that gives the random variable some of its dominant features. These 
numbers include moments of various orders associated with X. Let us first 
provide a general definition (Definition 4.1). 


Definition 4.1. Let g(X) be a real-valued function of a random variable X. 
The mathematical expectation, or simply expectation, of g(X), denoted by 
E{g{X)\, is defined by 


E{g{X)} = Y^g{xi)px{xi), 


(4.1) 


if X is discrete, where xi,X 2 , ■ ■ ■ are possible values assumed by X. 

When the range of i extends from 1 to infinity, the sum in Equation (4.1) 
exists if it converges absolutely; that is, 

OO 

'^\g{Xi)\Px{Xi) < OO. 
i=l 


The symbol E{ } is regarded here and in the sequel as the expectation operator. 
If random variable W is continuous, the expectation E{g{X)\ is defined by 


/ OO 

g{x)fx{x)dx, 

•OO 


(4.2) 


if the improper integral is absolutely convergent, that is, 

pco 

/ \g{x)Vx{^)'^^ <°°- 
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Let us note some basic properties associated with the expectation operator. 
For any constant c and any functions g(X) and h(X) for which expectations 
exist, we have 


E{c} = c, 

E{cg{X)} = cE{g{X)}, 

E{g(X)+h(X)} = E{g(X)} + E{k(X)}, I 

E{g{X)} < E{h{X)}, if g{X) < h{X) for all values of X. ^ 

These relations follow directly from the definition of E{g{X)}. For example, 

/ OO 

+ h{x)Yx{x)dx 

•OO 

/ OO /*oo 

g{x)fx{x)Ax+ / h{x)fx{x)Ax 

•OO J — OO 

= E{g{X)} + E{h{X)}, 

as given by the third of Equations (4.3). The proof is similar when X is discrete. 


4.1 MOMENTS OF A SINGLE RANDOM VARIABLE 

Let giX) = X",n = 1,2,...; the expectation E{X"}, when it exists, is called the 
nth moment of X. It is denoted by a„ and is given by 


a„ = E{X"} = y^^x’-pxjxj), for X discrete; 

i 

/ OO 

X'fx{x)dx, for X continuous. 

-OO 


(4.4) 

(4.5) 


4.1.1 MEAN, MEDIAN, AND MODE 

One of the most important moments is ai, the first moment. Using the mass 
analogy for the probability distribution, the first moment may be regarded as 
the center of mass of its distribution. It is thus the average value of random 
variable A and certainly reveals one of the most important characteristics of its 
distribution. The first moment of A is synonymously called the mean, expecta¬ 
tion, or average value of X. A common notation for it is mx or simply m. 
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Example 4.1. Problem: From Example 3.9 (page 64), determine the average 
number of cars turning west in a group of n cars. 

Answer: we wish to determine the mean of Y,E{Y}, for which the mass 
function is [from Equation (3.51)] 

PY{k)= /c = 0,1,2,...,«. 

Equation (4.4) then gives 


E{Y} = '^kpy{k) = '^k( j/(l-^)' 

k^o \k J 


—k 


k=0 




n\ 


^,{k-mn-k)\ 




n—k 


Let k — \ =m. We have 


m— 0 ^ ^ 

The sum in this expressions is simply the sum of binomial probabilities and 
hence equals one. Therefore, 

E{Y} = nq, 


which has a numerical value since n and q are known constants. 

Example 4.2. Problem: the waiting timeX (in minutes) of a customer waiting 
to be served at a ticket counter has the density function 


fx{x) 


2e for X > 0; 
0, elsewhere. 


Determine the average waiting time. 

Answer: referring to Equation (4.5), we have, using integration by parts, 

1 

E{X} = / x(2e^^^)dx =-minute. 

Jo 2 

Example 4.3. Problem: from Example 3.10 (pages 65), find the average 
resistance of the resistors after screening. 
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Answer: the average value required in this example is a conditional mean of R 
given the event A. Although no formal definition is given, it should he clear that 
the desired average is obtained from 

E{R\A}= f\fMA)Ar= /'“'^^dr. 

748 748 c 

This integral can he evaluated when/^(r) is specified. 

Two other quantities in common usage that also give a measure of centrality 
of a random variable are its median and mode. 

A median of X is any point that divides the mass of the distribution into two 
equal parts; that is, xo is a median of X if 

P{X<xo) = ^-. 


The mean of A may not exist, but there exists at least one median. 

In comparison with the mean, the median is sometimes preferred as a 
measure of central tendency when a distribution is skewed, particularly where 
there are a small number of extreme values in the distribution. For example, we 
speak of median income as a good central measure of personal income for a 
population. This is a better average because the median is not as sensitive to 
a small number of extremely high incomes or extremely low incomes as is 
the mean. 

Example 4.4. Let T be the time between emissions of particles by a radio¬ 
active atom. It is well established that T is a random variable and that it obeys 
an exponential distribution; that is. 


,/r(0 


Ae for t > 0; 
0, elsewhere; 


where A is a positive constant. The random variable T is called the lifetime of 
the atom, and a common average measure of this lifetime is called the half-life, 
which is defined as the median of T. Thus, the half-life, r is found from 

./r(0d? = 2 I 


or 


T = 
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Let us note that the mean life, E{T}, is given by 

E{T} = 


A point Xi such that 

Px(,Xi) > Px{xi+i) and px{xi) > px{xi^\), JL discrete, 
f xi^i) > f x{xi + e) and f x{xi) > f x{xi - e), A continuous, 

where e is an arbitrarily small positive quantity, is called a mode ofX. A mode is 
thus a value of X corresponding to a peak in its mass function or density 
function. The term unimodal distribution refers to a probability distribution 
possessing a unique mode. 

To give a comparison of these three measures of centrality of a random 
variable. Figure 4.1 shows their relative positions in three different situations. It 
is clear that the mean, the median, and the mode coincide when a unimodal 
distribution is symmetric. 


4.1.2 CENTRAL MOMENTS, VARIANCE, AND STANDARD 
DEVIATION 

Besides the mean, the next most important moment is the variance, which 
measures the dispersion or spread of random variable X about its mean. Its 
definition will follow a general definition of central moments (see Definition 4.2). 

Definition 4.2. The central moments of random variable X are the moments of 
X with respect to its mean. Hence, the nth central moment of A, is defined as 


(4.6) 

(4.7) 


fjLn = E{{X — m)"} = ^(x; — m)"px{xi), X discrete; 

i 

/ oo 

(x — m)"f x{x)dx, X continuous. 

-OO 


The variance of X is the second central moment,/X 2 , commonly denoted by 
or simply cP- or var(A). It is the most common measure of dispersion of 
a distribution about its mean. Large values of imply a large spread in 
the distribution of X about its mean. Conversely, small values imply a sharp 
concentration of the mass of distribution in the neighborhood of the mean. This is 
illustrated in Figure 4.2 in which two density functions are shown with the same 
mean but different variances. When <7^ = 0, the whole mass of the distribution is 
concentrated at the mean. In this extreme case, X = mx with probability 1. 
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Median 

Mean 



Figure 4.1 Relative positions of the mean, median, and mode for three distributions: 
(a) positively shewed; (b) symmetrical; and (c) negatively shewed 


An important relation between the variance and simple moments is 

cr^ = 0:2 — m^. (4.8) 

This can be shown by making use of Equations (4.3). We get 

cr^ = E{{X - mf} = E{X^ - 2mX + m^} = E{X^} - 2mE{X} + 

= a 2 — 2m^ + = a 2 — rrf'. 
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fx(x) 



Figure 4.2 Density functions with different variances, and f72 

We note two other properties of the variance of a random variable X which 
can be similarly verified. They are: 

var(W + c) = var(W), 
var(cW) = c^var(W), 



where c is any constant. 

It is further noted from Equations (4.6) and (4.7) that, since each term in the 
sum in Equation (4.6) and the integrand in Equation (4.7) are nonnegative, the 
variance of a random variable is always nonnegative. The positive square root 


ax = +[E{{X - 


is called the standard deviation of X. An advantage of using ax rather than cr^ 
as a measure of dispersion is that it has the same unit as the mean. It can 
therefore be compared with the mean on the same scale to gain some measure 
of the degree of spread of the distribution. A dimensionless number that 
characterizes dispersion relative to the mean which also facilitates comparison 
among random variables of different units is the coefficient of variation, vy, 
defined by 


vx 


(XX 
mx ' 


(4.10) 


Example 4.5. Let us determine the variance of Y defined in Example 4.1. 
Using Equation (4.8), we may write 


a\, = E{ Y^} — m\ = E{ Y^} — n^q^. 
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Now, 

= ^k{k- \)pY{k) + ^kpY{k), 
k=0 k=0 /:=0 


and 


'^kpYik) = nq. 
k=0 


Proceeding as in Example (4.1), 


'^k{k- l)pY{k) = n{n- ^(1 -?)" ^ 


k=0 


^Jk-2)\{n-k)\ 


k^2 


= n{n-\)q^^{ ^'(1 - ^ 

f=o V./ / 

= n{n — \)q^. 


Thus, 


E{ Y^} = n{n — l)q^ + nq, 


and 


a\ = n{n — \)q^ + nq — (nq)^ = nq{\ — q). 

Example 4.6. We again use Equation (4.8) to determine the variance of X 
defined in Example 4.2. The second moment of X is, on integrating by parts, 

E{X^} = 2 [ x^e-^’^dx = l-. 

Jo 2 

Hence, 

al = E{X^}-ml = ^-^ = ^. 

Example 4.7. Problem: owing to inherent manufacturing and scaling inaccura¬ 
cies, the tape measures manufactured by a certain company have a standard 
deviation of 0.03 feet for a three-foot tape measure. What is a reasonable 
estimate of the standard deviation associated with three-yard tape measures 
made by the same manufacturer? 
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Answer: for this problem, it is reasonable to expect that errors introduced in 
the making of a three-foot tape measure again are accountable for inaccuracies 
in the three-yard tape measures. It is then reasonable to assume that the 
coefficient of variation v = ajm is constant for tape measures of all lengths 
manufactured by this company. Thus 


and the standard deviation for a three-yard tape measures is 0.01 x (9 feet) = 
0.09 feet. 

This example illustrates the fact that the coefficient of variation is often 
used as a measure of quality for products of different sizes or different weights. 
In the concrete industry, for example, the quality in terms of concrete strength 
is specified by a coefficient of variation, which is a constant for all mean 
strengths. 

Central moments of higher order reveal additional features of a distribution. 
The coefficient of skewness, defined by 


71=^ (4.11) 

gives a measure of the symmetry of a distribution. It is positive when a uni- 
modal distribution has a dominant tail on the right. The opposite arrangement 
produces a negative 71 . It is zero when a distribution is symmetrical about the 
mean. In fact, a symmetrical distribution about the mean implies that all odd- 
order central moments vanish. 

The degree of flattening of a distribution near its peaks can be measured by 
the coefficient of excess, defined by 


72 = ^-3. (4.12) 

A positive 72 implies a sharp peak in the neighborhood of a mode in a unimodal 
distribution, whereas a negative 72 implies, as a rule, a flattened peak. The 
significance of the number 3 in Equation (4.12) will be discussed in Section 7.2, 
when the normal distribution is introduced. 


4.1.3 CONDITIONAL EXPECTATION 

We conclude this section by introducing a useful relation involving conditional 
expectation. Let us denote by E{X\ Y} that function of random variable Y for 
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which the value at F = is E{X\ Y = j,}. Hence, E{X\Y} is itself a random 
variable, and one of its very useful properties is that 


E{X) = E{E{X\Y)) 


(4.13) 


If 7 is a discrete random variable taking on values • • •, the above states 
that 


= = = (4.14) 

i 


and 


/ OO 

E{X\y}fy{y)Ay, 

-OO 


(4.15) 


if F is continuous. 

To establish the relation given by Equation (4.13), let us show that Equation 
(4.14) is true when both X and F are discrete. Starting from the right-hand side 
of Equation (4.14), we have 

^ i?{X| F = j,}P( F = J,.) = E E = X,.I F = j,)P( F = J,.). 

i i j 


Since, from Equation (2.24), 


P{X = Xj\Y = yi) = 


P{X = Xy n F = yi) 
P{Y = yi) 


we have 


Y,E{X\Y = y,}P{Y = y,)=Y.ll XjPxY{xj,yi) 

i i j 

= Y^xjY^PxY{xj,yi) 

j i 

= ^XjPx{Xj) 

j 

= E{X}, 


and the desired result is obtained. 

The usefulness of Equation (4.13) is analogous to what we found in using the 
theorem of total probability discussed in Section 2.4 (see Theorem 2.1, page 23). 
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It states that, in order to determine E{X}, it can be found by taking a weighted 
average of the conditional expectation of X given Y = p,; each of these terms is 
weighted by probability P( Y = j,). 

Example 4.8. Problem: the survival of a motorist stranded in a snowstorm 
depends on which of the three directions the motorist chooses to walk. The first 
road leads to safety after one hour of travel, the second leads to safety after 
three hours of travel, but the third will circle back to the original spot after two 
hours. Determine the average time to safety if the motorist is equally likely to 
choose any one of the roads. 

Answer: let Y = 1,2, and 3 be the events that the motorist chooses the first, 
second and third road, respectively. Then P(Y = i) =1/3 for i = 1,2, 3. Let X 
be the time to safety, in hours. We have: 

3 

E{X} = Y,E{X\Y = i}P{Y=i) 

/-I 

= ^-j2E{X\Y = i}. 

i=\ 

Now, 

E{X\Y=\}=\, j 

E{A|T = 2} = 3, I (4.16) 

E{X\Y = 3} = 2 +E{X).] 

Hence 

E{X}=^-{\ + 3 + 2 + E{X}), 


or 


E{X} = 3 hours. 

Let us remark that the third relation in Equations (4.16) is obtained by noting 
that, if the motorist chooses the third road, then it takes two hours to find that 
he or she is back to the starting point and the problem is as before. Hence, the 
motorist’s expected additional time to safety is just E{X}. The result is thus 
2 + E{X}. We further remark that problems of this type would require much 
more work were other approaches to be used. 
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4.2 CHEBYSHEV INEQUALITY 

In the discussion of expectations and moments, there are two aspects to be 
considered in applications. The first is that of calculating moments of various 
orders of a random variable knowing its distribution, and the second is con¬ 
cerned with making statements about the behavior of a random variable when 
only some of its moments are available. The second aspect arises in numerous 
practical situations in which available information leads only to estimates of 
some simple moments of a random variable. 

The knowledge of mean and variance of a random variable, although very 
useful, is not sufficient to determine its distribution and therefore does not 
permit us to give answers to such questions as ‘What is P{X < 5)T However, as 
is shown in Theorem 4.1, it is possible to establish some probability bounds 
knowing only the mean and variance. 

Theorem 4.1: the Chebyshev inequality states that 


P{\X -mx\ > kcjx) < 


1 


(4.17) 


for any k > 0. 

Proof: from the definition we have 


cri = 


/ {x-mx)^fx{x)dx> / {x-mxffx{x)dx 

J—cci m^|>/c(Tx 




f \x—mx\>kax 


fx{x)dx 


= k^aj^P{\X - mx\ > kax). 


Expression (4.17) follows. The proof is similar when X is discrete 

Example 4.9. In Example 4.7, for three-foot tape measures, we can write 


l{k = 2. 


P(|Y-3| >0.03ic) <^. 
P(|Y-3| >0.06) <i, 
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or 


P(2.94 < JT < 3.06) > ^. 

In words, the probability of a three-foot tape measure being in error less than 
or equal to ±0.06 feet is at least 0.75. Various probability bounds can be found 
by assigning different values to k. 

The complete generality with which the Chebyshev inequality is derived 
suggests that the bounds given by Equation (4.17) can be quite conservative. 
This is indeed true. Sharper bounds can be achieved if more is known about the 
distribution. 


4.3 MOMENTS OF TWO OR MORE RANDOM VARIABLES 

Let g{X, T) be a real-valued function of two random variables X and T. Its 
expectation is defined by 


E{g{X, T)} = ^^g(xi,yj)PxY{^iXi), X and Y discrete, 
i ./■ 

/ OO POO 

/ ^ ^ continuous, 

-OO J —OO 


( 4 . 18 ) 

( 4 . 19 ) 


if the indicated sums or integrals exist. 

In a completely analogous way, the joint moments a„m of X and Y are given 
by, if they exist, 

a„„ = E{A«F™}. (4.20) 

They are computed from Equation (4.18) or (4.19) by letting g(X, Y) = X'^Y"'. 

Similarly, the joint central moments of X and T, when they exist, are 
given by 

p„„, = E{{X-mxy\Y-naYr}. (4.21) 

They are computed from Equation (4.18) or (4.19) by letting 
g{X, F) = (A-mz)"(T-my)™. 

Some of the most important moments in the two-random-variable case are 
clearly the individual means and variances of X and Y. In the notation used 
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here, the means of X and Y are, respectively, am and aoi- Using Equation 
(4.19), for example, we obtain: 


/ oo poo poo poo 

/ xfxY{x,y)Axdy= I X fYY{x,y)dydx 

■OO J—OO J—OO J—OO 

/ OO 

xfx{x)dx, 


where fxix) is the marginal density function ofX. We thus see that the result is 
identical to that in the single-random-variable case. 

This observation is, of course, also true for the individual variances. They are, 
respectively, fX 2 o and /io 2 , and can be found from Equation (4.21) with appropriate 
substitutions for n and m. As in the single-random-variable case, we also have 


or 


2 

M20 = <220 — <2io 
2 

M02 = <202 — <2oi 



(4.22) 


4.3.1 COVARIANCE AND CORRELATION COEFFICIENT 

The first and simplest joint moment of X and Y that gives some measure of 
their interdependence is /in = E{{X — mx){Y — triY)}. It is called the covar¬ 
iance of X and Y. Let us first note some of its properties. 

Property 4.1: the covariance is related to a„m by 

Mil = <211 — < 2 io< 2 oi = <211 — mxmx- ( 4 - 23 ) 

Proof of Property 4.1: Property 4.1 is obtained by expanding 
(X — mx)(Y — my) and then taking the expectation of each term. We have: 

/ill = E{{X — mx){Y — mx)} = E{XY — mxX — mxY -F mxmx} 

= E{XY} — mYE{X} — mxE{Y} -F mxtnx 
= an — aio<2oi — <2io<2oi -F <2io<2oi 
= an — aio<2oi. 


Property 4.2: let the correlation coefficient of X and Y be defined by 


Mil _ Mil 
( M 20 M 02 )'^^ ^xc^y 


(4.24) 


Then, |/3| < 1. 
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Proof of Property 4.2: to show Property 4.2, let t and u be any real quantities 
and form 


</>(!,«) = E{[t{X -m^) + u{Y - mY)f} 
— 2/ri 1 tu + ■ 


Since the expectation of a nonnegative function of X and Y must be non¬ 
negative, is a nonnegative quadratic form in t and u, and we must 

have 


M20M02 — Mil ^ 0 ; 


(4.25) 


which gives the desired result. 

The normalization of the covariance through Equation (4.24) renders p a 
useful substitute for pw. Furthermore, the correlation coefficient is dimension¬ 
less and independent of the origin, that is, for any constants a\,a 2 ,bi, and b 2 
with ai > 0 and 02 > 0, we can easily verify that 

p{a\X + bi,a2Y + b2) = p{X,Y). (4-26) 

Property 4.3. If X and Y are independent, then 

^11=0 and p=0. (4.27) 

Proof of Property 4.3: let X and Y be continuous; their joint moment an is 
found from 


an =E{XY}= / / xj/;^y(x, j)dxdj. 

J —00 J —00 

IfX and Y are independent, we see from Equation (3.45) that 

fxY{x,y) =fx{x)fY{y), 

and 


POO POO 


an = 


xyfx{x)f Y{y)^xdy = / xfY{x)dx / j/y(>’)dj; 


J —00 J —00 

= mxmy. 


Equations (4.23) and (4.24) then show that /rn = 0 and p = 0. A similar result 
can be obtained for two independent discrete random variables. 
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This result leads immediately to an important generalization. Consider a 
function of X and Y in the form g(X)h(Y) for which an expectation exists. 
Then, if X and Y are independent, 

E{g{X)h{ T)} = E{giX)}E{h{ T)}. (4.28) 

When the correlation coefficient of two random variables vanishes, we say 
they are uncorrelated. It should be carefully pointed out that what we have 
shown is that independence implies zero correlation. The converse, however, is 
not true. This point is more fully discussed in what follows. 

The covariance or the correlation coefficient is of great importance in the 
analysis of two random variables. It is a measure of their linear interdependence 
in the sense that its value is a measure of accuracy with which one random 
variable can be approximated by a linear function of the other. In order to see 
this, let us consider the problem of approximating a random variable W by a 
linear function of a second random variable Y,aY+b, where a and b are 
chosen so that the mean-square error e, defined by 

e = E{[X-{aY + b)f}, (4.29) 

is minimized. Upon taking partial derivatives of e with respect to a and b and 
setting them to zero, straightforward calculations show that this minimum is 
attained when 


crxP 

a = - 

cry 


and 


b = nix — amy 


Substituting these values into Equation (4.29) then gives cr|.(l — p^) as the 
minimum mean-square error. We thus see that an exact fit in the mean-square 
sense is achieved when |/5| = 1, and the linear approximation is the worst when 
p = 0. More specifically, when p = -|-1, the random variables X and Y are said 
to be positively perfectly correlated in the sense that the values they assume fall 
on a straight line with positive slope; they are negatively perfectly correlated 
when p = —\ and their values form a straight line with negative slope. These 
two extreme cases are illustrated in Figure 4.3. The value of \p\ decreases as 
scatter about these lines increases. 

Let us again stress the fact that the correlation coefficient measures only the 
linear interdependence between two random variables. It is by no means a 
general measure of interdependence between X and Y. Thus, p = 0 does not 
imply independence of the random variables. In fact. Example 4.10 shows, the 
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y 



correlation coefficient can vanish when the values of one random variable are 
completely determined by the values of another. 


Example 4.10. Problem: determine the correlation coefficient of random 
variables X and Y when X takes values ±1 and ±2, each with probability 1/4, 
and Y = Zl 

Answer: clearly, Y assumes values 1 and 4, each with probability 1/2, and 
their joint mass function is given by: 


PxY{x,y) 



for (x,j) = (-1,1); 
for (x,y) = (1,1); 
for {x,y) = (-2,4); 
for (x,y) = (2,4). 


The means and second moment an are given by 

».x = (-2)(l)+(-l)(i)+(l)(l)+(2)(i)=0. 

».x = (l)0+(4)(i)=2.5, 

«ii = (-1)(1) Q) + (1)(1) Q) + (-2)(4) Q + (2)(4) (^1) = 0. 
Hence, 


an — mxmy = 0, 
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and, from Equations (4.23) and (4.24), 

p = 0. 

This is a simple example showing that X and Y are uncorrelated but they are 
completely dependent on each other in a nonlinear way. 


4.3.2 SCHWARZ INEQUALITY 

In Section 4.3.1, an inequality given by Equation (4.25) was established in the 
process of proving that jpl < 1: 

= ImiiP < M20M02- (4.30) 

We can also show, following a similar procedure, that 


E^{XY} = \E{XY}\^ < E{X^}E{Y^}. 


(4.31) 


Equations (4.30) and (4.31) are referred to as the Schwarz inequality. We point 
them out here because they are useful in a number of situations involving 
moments in subsequent chapters. 


4.3.3 THE CASE OF THREE OR MORE RANDOM VARIABLES 

The expectation of a function g(Xi,X 2 ,... ,XQ of n random variables 
Xi,X 2 ,... ,X„ is defined in an analogous manner. Following Equations (4.18) 
and (4.19) for the two-random-variable case, we have 

E{g{Xu...,X„)} = E-E g{xu^,. ■ ■ ■ -.XniQ), 

i\ in 

X\,... ,Xn discrete; (4-32) 

/ CO noo 

... g{xi,...,x„)fx^,„xS^\,---,x„)dxi...dx„, 

•OO J —CO 

Xi,... ,X„ continuous; (4.33) 

where px^ x:„ fxi x„ ^^e, respectively, the joint mass function and joint 
density function of the associated random variables. 

The important moments associated with n random variables are still the 
individual means, individual variances, and pairwise covariances. Let X be 


TLFeBOOK 




Expectations and Moments 


93 


the random column vector with components Xi,... ,X„, and let the means of 
Xi,...,X„ be represented by the vector nix- A convenient representation of 
their variances and covariances is the covariance matrix. A, defined by 

K=E{{X-mx){X-mxf}, (4.34) 


where the superscript T denotes the matrix transpose. The nx n matrix A has 
a structure in which the diagonal elements are the variances and in which the 
nondiagonal elements are covariances. Specifically, it is given by 


var(Ai) cov(A'i,A2) 
cov(A'2,Ai) var(A'2) 


cov{Xi,X„y 

cov(X 2 ,X„) 


(4.35) 


\_cov{X„,Xx) cov{X„,X 2 ) 


var(A„) J 


In the above ‘var’ reads ‘variance of’ and ‘cov’ reads ‘covariance of’. Since 
cov(A;, Xj) = C0Y(Xj, Xi), the covariance matrix is always symmetrical. 

In closing, let us state (in Theorem 4.2) without proof an important result 
which is a direct extension of Equation (4.28). 

Theorem 4.2: if Xi,X2,... ,X„ are mutually independent, then 


E{gx(Xx)g2(X2). ..g„(X„)} = E{gx(Xx)}E{g2(X2)} . ..E{g„(X„)}, 


(4.36) 


where gj(Xj) is an arbitrary function of Xj. It is assumed, of course, that all 
indicated expectations exist. 


4.4 MOMENTS OF SUMS OF RANDOM VARIABLES 

Let Xi,X 2 , ■ ■ ■ ,Xn be n random variables. Their sum is also a random variable. 
In this section, we are interested in the moments of this sum in terms of 
those associated with Xj,j=\,2,...,'a. These relations find applications 
in a large number of derivations to follow and in a variety of physical 
situations. 

Consider 


Y = Xx+X2 + --- + X„. (4.37) 

Let Mj and aj denote the respective mean and variance of A}. Results 4.1-4.3 
are some of the important results concerning the mean and variance of Y. 
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Verifications of these results are carried out for the case where Xj,... ,X„ are 
continuous. The same procedures can be used when they are discrete. 

Result 4.1: the mean of the sum is the sum of the means; that is, 


niY = mi + m 2 + ■ ■ ■ + m„. 


(4.38) 


Proof of Result 4.1: to establish Result 4.1, consider 


my = E{Y} = E{Xi +X2 + --- + Xn} 

/ oo nOQ 

... / (xi H- V x„)f y^{xi,... ,x„)Axi .. Ax„ 

-OO J —OO 

/ OO pco 

■■■ Xifx^ x^{xi,...,x„)dxi...dx„ 

-OO J —OO 

/ oo poo 

■■■ ’ - - - > ^n)dxi ...dx„ + ... 

-OO J —00 

/ oo poo 

■ ■■ ^nfx,...x„{^u- ■ • ,x„)dxi .. .dx„. 

-OO J —00 


The first integral in the final expression can be immediately integrated with 
respect to X 2 ,X 2 ,... ,x„, yieldingthe marginal density function of Xi. 
Similarly, the (« — l)-fold integration with respect to x 1 ,^ 3 ,... ,x„ in the second 
integral gives and so on. Hence, the foregoing reduces to 

/ OO poo 

x\J'x^{xi)dxi A -h/ xJx^{x„)dx„ 

-OO J —OO 

= m\ + m 2 + ■ ■ ■ + mn. 


Combining Result 4.1 with some basic properties of the expectation we 
obtain some useful generalizations. For example, in view of the second of 
Equations (4.3), we obtain Result 4.2. 

Result 4.2: if 


X — Cl\Xi -\- CI2X2 4 “ - * * “F ClnAn^ (‘^•^^) 

where 01,02, ■■■ ,a„ are constants, then 

niz = aim\ + a2»^2 + • • • + anm„ (4.40) 

Result 4.3: let Xi,... ,X„ be mutually independent random variables. Then 
the variance of the sum is the sum of the variances; that is. 
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cr^y = (rl + crl+--- + CT^. 


(4.41) 


Let us verify Result 4.3 for n = 2. The proof for the case of n random 
variables follows at once by mathematical induction. Consider 

Y = Xx+X2. 

We know from Equation (4.38) that 


my = m\ + 1712- 


Subtracting niy from Y, and (m\ + 1712 ) from (Xi +X 2 ) yields 
Y - my = {Xi - mi) + (X 2 - m 2 ) 


and 


4 = E{{Y- my)^} = E{[{Xi - mi) + {X 2 - m 2 )f} 

= E{{Xi -mif + 2{Xi - mi){X2 - m 2 ) + {X 2 - m2)^} 

= E{{Xi-mi)^} + 2E{{Xi-mi){X2-m2)} + E{{X2-m2)^} 

= (T[ + 2 COv(Wi, X 2 ) + 

The covariance cov(Xi,X 2 ) vanishes, since Xi and X 2 are independent [see 
Equation (4.27)], thus the desired result is obtained. 

Again, many generalizations are possible. Eor example, if Z is given by 
Equation (4.39), we have, following the second of Equations (4.9), 

al = ajaj + ■ ■ ■ + alal (4.42) 

Let us again emphasize that, whereas Equation (4.38) is valid for any set of 
random variables Xi,.. .X„, Equation (4.41), pertaining to the variance, holds 
only under the independence assumption. Removal of the condition of inde¬ 
pendence would, as seen from the proof, add covariance terms to the right- 
hand side of Equation (4.41). It would then have the form 

(Ty = crj -|- (Tj + ■ ■ ■ + 0 "^ + 2 cov(Ai, X 2 ) -f 2 cov(Ai, Xj,) -f • • • -f 2 cov{X„_i, X„) 

= E + 2 E E (4.43) 

./=1 '=1 ./=2 

KJ 
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Example 4.11. Problem; an inspection is made of a group of n television 
picture tubes. If each passes the inspection with probability p and fails with 
probability q(p + q = 1 ), calculate the average number of tubes in n tubes that 
pass the inspection. 

Answer: this problem may be easily solved if we introduce a random variable 
Xj to represent the outcome of the jih inspection and define 

r 1, if the jth tube passes inspection; 

— I 

L 0 , if theyth tube does not pass inspection. 

Then random variable 7, defined by 

Y = Xx+X 2 + --- + X„, 

has the desired property that its value is the total number of tubes passing the 
inspection. The mean of Xj is 

E{Xj} = 0{q) + \(p)=p. 

Therefore, as seen from Equation (4.38), the desired average number is given by 


my = E{X\} H-h E{X„} = np. 


We can also calculate the variance of Y. If Xi,...,X„ are assumed to be 
independent, the variance of Xj is given by 

of = F{(Xj-pf} = (0 -pf{q) + (1 -pfp=pq- 
Equation (4.41) then gives 


CTy = CTi H-f = npq. 


Example 4.12. Problem: let Xi,...,X„ be a set of mutually independent 
random variables with a common distribution, each having mean m. Show 
that, for every £ > 0 , and as n ^ oo, 


P 


Y 

- m 

n 



0 , 


where Y = X\ + -h JT„. 


(4.44) 


Note: this is a statement of the law of large numbers. The random variable Y jn 
can be interpreted as an average of n independently observed random variables 
from the same distribution. Equation (4.44) then states that the probability that 
this average will differ from the mean by greater than an arbitrarily prescribed 
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e tends to zero. In other words, random variable Y/n approaches the true mean 
with probability 1 . 

Answer: to proceed with the proof of Equation (4.44), we first note that, if cr^ 
is the variance of each Xj, it follows from Equation (4.41) that 

a\ = ncP'. 

According to the Chebyshev inequality, given by Expression (4.17), for every 
k > 0, we have 


2 

Ed F — nm\ >k)< . 

Eor k = en, the left-hand side is less than a^j(e^n), which tends to zero as 
n ^ oo. This establishes the proof. 

Note that this proof requires the existence of cr^. This is not necessary but 
more work is required without this restriction. 

Among many of its uses, statistical sampling is an example in which the law of 
large numbers plays an important role. Suppose that in a group of m families 
there are mj number of families with exactly j children (7 = 0,1 ,..., and 
rriQ + m \ + ... = m). Eor a family chosen at random, the number of children is 
a random variable that assumes the value r with probability = m,.lm. A sample 
of n families among this group represents n observed independent random 
variables Yj,... ,X„, with the same distribution. The quantity (Xi + ■ ■ ■ +X„)/n 
is the sample average, and the law of large numbers then states that, for 
sufficiently large samples, the sample average is likely to be close to 

m = rpr = rnir/m, 

/■=0 r=0 


the mean of the population. 

Example 4.13. The random variable Yin in Example 4.12 is also called the 
sample mean associated with random variables Yi,... ,Y„ and is denoted by Y. 
In Example 4.12, if the coefficient of variation for each Y, is v, the coefficient of 
variation vj of Y is easily derived from Equations (4.38) and (4.41) to be 


V 


(4.45) 


Equation (4.45) is the basis for the law of y/n by Schrddinger, which states that 
the laws of physics are accurate within a probable relative error of the order of 
^- 1 / 2 ^ where n is the number of molecules that cooperate in a physical process. 
Basically, what Equation (4.45) suggests is that, if the action of each molecule 


TLFeBOOK 



98 


Fundamentals of Probability and Statistics for Engineers 


exhibits a random variation measured by v, then a physical process resulting 
from additive actions of n molecules will possess a random variation measured 
by It decreases as n increases. Since n is generally very large in the 

workings of physical processes, this result leads to the conjecture that the laws 
of physics can be exact laws despite local disorder. 


4.5 CHARACTERISTIC FUNCTIONS 

The expectation of a random variable X is defined as the characteristic 

function ofX. Denoted by 4>x(f), it is given by 


(t)x{t) = ^ X discrete; 


Mt) = EW'""} 


/ oo 

-OO 


X continuous; 


(4.46) 

(4.47) 


where t is an arbitrary real-valued parameter and j = The characteristic 

function is thus the expectation of a complex function and is generally complex 
valued. Since 


= I cos tX + j sin tX\ = 1, 

the sum and the integral in Equations (4.46) and (4.47) exist and therefore (jixf) 
always exists. Furthermore, we note 


0z(O) = 1, ) 

M-t) = > (4.48) 

\4’x{t)\ <1, J 


where the asterisk denotes the complex conjugate. The first two properties are 
self-evident. The third relation follows from the observation that, since 
fx(x) > 0, 


\Mt)\ = 


c>‘Yxi^)dx 


< 


fx{x)dx= 1. 


The proof is the same as that for discrete random variables. 

We single this expectation out for discussion because it possesses a number of 
important properties that make it a powerful tool in random-variable analysis 
and probabilistic modeling. 
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4.5.1 GENERATION OF MOMENTS 

One of the important uses of characteristic functions is in the determination of 
the moments of a random variable. Expanding (l)xif) as a MacLaurin series, we 
see that (suppressing the subscript X for convenience) 

= m + 0'(O)t + 0"(O) ^ + • • • + 4"^ {Q) (4.49) 

z n\ 

where the primes denote derivatives. The coefficients of this power series are, 
from Equation (4.47), 


'('( 0 ) = / fx{x)dx = 1 

J —OO 


0'(O)=- 


dt 


t=Q J —OO 


/ OO 

}xf x{x)dx = iai, 

•OO 


( 0 ) = 


d>(t) 


dt« 


/ OO 

rx"fx{x)dx = i"a„ 

-OO 


(4.50) 


Thus, 


OO 


(4.51) 


n=\ 


The same results are obtained when X is discrete. 

Equation (4.51) shows that moments of all orders, if they exist, are contained 
in the expansion of <)(?), and these moments can be found from through 
differentiation. Specifically, Equations (4.50) give 


J 


’( 0 ), «= 1 , 2 ,... 


(4.52) 


Example 4.14. Problem: determine the mean, and the variance of a 
random variable X if it has the binomial distribution 
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Answer: according to Equation (4.46), 


= [pc^‘ + {\-p)]\ 


Using Equation (4.52), we have 


+ 0--p)T = «[/teJ' +( 1 -/?)]" \pQ^‘) 

J r=0 


ai = --^[pQ^‘ + (\ -p)]" =np[(n- \)p + 1 ], 


(T^ = 0:2 — aj = np[{n — \)p + 1 ] — n^p^ = np{\ — p). 

The results for the mean and variance are the same as those obtained in 
Examples 4.1 and 4.5. 

Example 4.15. Problem: repeat the above whenX is exponentially distributed 
with density function 


fx{x) = 


ae for x > 0 ; 
0 , elsewhere. 


Answer: the characteristic function in this case is 

poo poo 

= / ej'^(ae-“)dx = a e-(“-j')^dx = - 
Jo Jo ^ 


The moments are 


Id/ a 


1 ]a 




f a 


df-\a-]tJl^Q a^' 
. 1 


2 2 

Gx — Oij — 


which agree with the moment calculations carried out in Examples 4.2 and 4.6. 
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Another useful expansion is the power series representation of the logarithm 
of the characteristic function; that is, 


log (j)x{t) = 


n\ 


where coefficients X„ are again obtained from 


(4.55) 


(4.56) 


The relations between coefficients A„ and moments an can be established by 
forming the exponential of log expanding this in a power series of jt, and 

equating coefficients to those of corresponding powers in Equation (4.51). We 
obtain 


Ai = ai, 

X 2 = 02 — al, 

> 

A 3 = q ;3 — 3q!iQ!2 + 2q;j, 

A 4 = q ;4 — 3 q !2 — 4 aiQ ;3 + I2a^a2 — 6a^. ^ 


(4.57) 


It is seen that Ai is the mean, A 2 is the variance, and A 3 is the third central 
moment. The higher order An are related to the moments of the same order or 
lower, but in a more complex way. Coefficients A„ are called cumulants of X 
and, with a knowledge of these cumulants, we may obtain the moments and 
central moments. 


4.5.2 INVERSION FORMULAE 

Another important use of characteristic functions follows from the inversion 
formulae to be developed below. 

Consider first a continuous random variable X. We observe that Equation 
(4.47) also defines as the inverse Eourier transform of fxM- The other 
half of the Eourier transform pair is 

1 r°° ■ 

fxi.x) = yy e ^‘^(t)x{t)dt. (4.58) 

This inversion formula shows that knowledge of the characteristic function 
specifies the distribution of X. Eurthermore, it follows from the theory of 
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Fourier transforms that fx(x) is uniquely determined from Equation (4.58); 
that is, no two distinct density functions can have the same characteristic 
function. 

This property of the characteristic function provides us with an alternative 
way of arriving at the distribution of a random variable. In many physical 
problems, it is often more convenient to determine the density function of a 
random variable by first determining its characteristic function and then per¬ 
forming the Fourier transform as indicated by Equation (4.58). Furthermore, 
we shall see that the characteristic function has properties that render it 
particularly useful for determining the distribution of a sum of independent 
random variables. 

The inversion formula of Equation (4.58) follows immediately from the 
theory of Fourier transforms, but it is of interest to give a derivation of this 
equation from a probabilistic point of view. 


Proof of Equation (4.58): an integration formula that can be found in any 
table of integrals is 


1 

TT 



smat , 

- dt = 

t 


— 1, for a < 0; 
0, for a = 0; 

1, for a > 0. 


(4.59) 


This leads to 


1 

TT 



sinat + j(l 
t 


cosat) , 
- -dt 


— 1, for a < 0; 
0, for a = 0; 

1, for a > 0; 


(4.60) 


because the function (1 — cos at)lt is an odd function of t so that its integral 
over a symmetric range vanishes. Upon replacing a'o'j X —x in Equation 
(4.60), we have 


1 

2 


j I — 


dt = 


1, for T" < x; 

-, for X = x\ 

2 

0, for X > X. 


(4.61) 


For a fixed value ofx. Equation (4.61) is a function of random variable X, and 
it may be regarded as defining a new random variable Y. The random variable 
Y is seen to be discrete, taking on values 1, and 0 with probabilities 
P{X < x),P{X =x), andP(X > x), respectively. The mean of T is thus equal to 


E{Y} = (l)P(X < x) + = ■^) + (O)^(^ > x)- 
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However, notice that, since X is continuous, P{X =x) = 0 if x is a point of 
continuity in the distribution ofX. Hence, using Equation (4.47), 

E{Y} = P{X <x)=Fx{x) 


1 j /■“ 1 - , 

= — 

2 27ri_oo t 


(4.62) 


1 

2 


27r/ 


The above defines the probability distribution function of X. Its derivative 
gives the inversion formula 


1 /■“ ■ 

./xW = ^ (4.63) 

J —OO 

and we have Equation (4.58), as desired. 

The inversion formula when X is a discrete random variable is 


Px{x) = lim^ [ e (4.64) 

A proof of this relation can be constructed along the same lines as that given 
above for the continuous case. 

Proof of Equation (4.64): first note the standard integration formula: 


1 

2 u 



sin au 
au 


1 , 


for a 7 ^ 0; 


for a = 0. 


(4.65) 


Replacing a by X — x and taking the limit as m ^ oo, we have a new random 
variable Y, defined by 


Y = 


lim ^ 

M^oo 2 u 


0, for X 7 ^ x; 
1, for X = X. 


The mean of Y is given by 


E{ Y} = (l)P(X = x) + (0)/’(X 7 ^ x) = P{X = x), (4.66) 
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and therefore 


Px{x) = lim — r 
»^oo2m7_„ 

= \im^ [ e-^‘^(j)x{t)dt, 
«^oo2m J_„ 


(4.67) 


which gives the desired inversion formula. 

In summary, the transform pairs given hy Equations (4.46), (4.47), (4.58), 
and (4.64) are collected and presented below for easy reference. For a contin¬ 
uous random variable X, 


/ OO 

x(x)dx, 

-OO 

1 /■“ 

fx(x) =1^ e >‘^(j>x{t)dt; 

J —OQ 


and, for a discrete random variable X, 


i 

Px{x) = \im^ [ e^'^(/)x(0dF 
»^oo2m 


(4.68) 


(4.69) 


Of the two sets, Equations (4.68) for the continuous case are more important in 
terms of applicability. As we shall see in Chapter 5, probability mass functions 
for discrete random variables can be found directly without resorting to their 
characteristic functions. 

As we have mentioned before, the characteristic function is particularly 
useful for the study of a sum of independent random variables. In this connec¬ 
tion, let us state the following important theorem, (Theorem 4.3). 

Theorem 4.3: The characteristic function of a sum of independent random 
variables is equal to the product of the characteristic functions of the individual 
random variables. 

Proof of Theorem 4.3: Let 

T = Xi-FX2-F----FX„. (4.70) 
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Then, by definition, 

SinceXi,X 2 ,... ,X„ are mutually independent. Equation (4.36) leads to 

gjrz„| ^ £.|gjrz, }£{ejrX2| ^ ^ £{ej'^»}. 

We thus have 

0f(O = (l>Xi{t)4>X2{t) ■ ■ ■ (4-71) 

which was to be proved. 

In Section (4.4), we obtained moments of a sum of random variables; 
Equation (4.71), coupled with the inversion formula in Equation (4.58) or 
Equation (4.64), enables us to determine the distribution of a sum of random 
variables from the knowledge of the distributions oiXjJ = 1, 2,...,«, provided 
that they are mutually independent. 

Example 4.16. Problem: letXi andX 2 be two independent random variables, 
both having an exponential distribution with parameter a, and let 
Y = Xi + X 2 . Determine the distribution of 7. 

Answer: the characteristic function of an exponentially distributed random 
variable was obtained in Example 4.15. From Equation (4.54), we have 

<l>xXt) = M)=^^- 

According to Equation (4.71), the characteristic function of Y is simply 

4>Y{t) = (j)xXt)4>x2{t) =-— 2^. 

{a-]ty 
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The distribution given by Equation (4.72) is called a gamma distribution, 
which will be discussed extensively in Section 7.4. 

Example 4.17. In 1827, Robert Brown, an English botanist, noticed that small 
particles of matter from plants undergo erratic movements when suspended in 
fluids. It was soon discovered that the erratic motion was caused by impacts on 
the particles by the molecules of the fluid in which they were suspended. This 
phenomenon, which can also be observed in gases, is called Brownian motion. 
The explanation of Brownian motion was one of the major successes of statistical 
mechanics. In this example, we study Brownian motion in an elementary way by 
using one-dimensional random walk as an adequate mathematical model. 

Consider a particle taking steps on a straight line. It moves either one step to 
the right with probability p, or one step to the left with probability 
q(p + q= 1). The steps are always of unit length, positive to the right and 
negative to the left, and they are taken independently. We wish to determine the 
probability mass function of its position after n steps. 

Let Xi be the random variable associated with the ith step and define 

f 1, if it is to the right; 

Xi = l ® 4.73 

\ — 1, if it is to the left. 

Then random variable Y, defined by 

Y = Xi+X2 + --- + X„, 

gives the position of the particle after n steps. It is clear that Y takes integer 
values between —n and n. 

To determine pyik), —n < k < n, we first find its characteristic function. The 
characteristic function of each X, is 


4>x,{t) = = pe^‘ + qe . (4-74) 

It then follows from Equation (4.71) that, in view of independence, 

= 4 >x,{t)4'X2{t) ■ ■ ■ 4>x„{t) 

= (pe^‘+ qe-'^‘f. (4.75) 


Let us rewrite it as 


Mt) = + qf 
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Letting k = 2i — n, we get 

= E (4.76) 

2 / 

Comparing Equation (4.76) with the definition in Equation (4.46) yields the 
mass function 

k=-n,-{n-2),...,n. (4.77) 


Note that, if n is even, k must also he even, and, if n is odd k must be odd. 

Considerable importance is attached to the symmetric case in which k <^n, 
and p = q = 1/2. In order to consider this special case, we need to use Stirling’s 
formula, which states that, for large n. 


n\ ^ (27r)‘/^e-"«"+5 


Substituting this approximation into Equation (All) gives 



(4.78) 


(4.79) 


A further simplification results when the length of each step is small. Assuming 
that r steps occur in a unit time (i.e. n = rt) and letting a be the length of each 
step, then, as n becomes large, random variable Y approaches a continuous 
random variable, and we can show that Equation (4.79) becomes 

where y = ka. On letting 


we have 

f viy) = -— rTt^xpf— -EY — oo < j < oo. (4-81) 

The probability density function given above belongs to a Gaussian or normal 
random variable. This result is an illustration of the central limit theorem, to be 
discussed in Section 7.2. 
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Our derivation of Equation 4.81 has been purely analytical. In his theory of 
Brownian motion, Einstein also obtained this result with 


D = 


2RT 

~W' 


(4.82) 


where R is the universal gas constant, T is the absolute temperature, N is 
Avogadro’s number, and / is the coefficient of friction which, for liquid or 
gas at ordinary pressure, can be expressed in terms of its viscosity and particle 
size. Perrin, a French physicist, was awarded the Nobel Prize in 1926 for his 
success in determining, from experiment, Avogadro’s number. 


4.5.3 JOINT CHARACTERISTIC FUNCTIONS 

The concept of characteristic functions also finds usefulness in the case of two 
or more random variables. The development below is concerned with contin¬ 
uous random variables only, but the principal results are equally valid in the 
case of discrete random variables. We also eliminate a bulk of the derivations 
involved since they follow closely those developed for the single-random- 
variable case. 

The joint characteristic function of two random variables X and T, 4>xY{t,s), 
is defined by 


/ oo poo 

-OO J —OO 


(4.83) 


where t and ^ are two arbitrary real variables. This function always exists and 
some of its properties are noted below that are similar to those noted for 
Equations (4.48) corresponding to the single-random-variable case: 


</>vy(0,0) = 1, j 

4>XY{—t,—s) = 4i*xYi.^T^)^ I (4.84) 

\(l)XY{t,S)\ < 1 - J 


Furthermore, it is easy to verify that joint characteristic function (j>xY{t,s) is 
related to marginal characteristic functions (pxit) and (f>Y{s) by 


'(>v(l) = 4’XYit,0), 1 

<I)y{s) = 4>xy{0,s). ] 
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If random variables X and Y are independent, then we also have 

(j)XYit,s) = (j3xit)4>Yis). (4.86) 

To show the above, we simply substitute fx0()f y^Y) fxY^YX) ir* Equation 
(4.83). The double integral on the right-hand side separates, and we have 

/ OO /*oo 

^^‘Yx{x)dx / eJ'>’/y(>’)dT 

-OO J —OO 

= 4'x{t)4iY{s), 

and we have the desired result. 

Analogous to the one-random-variable case, joint characteristic function 
(pxYids) is often called on to determine joint density function of X 

and Y and their joint moments. The density function fxYix,y) is uniquely 
determined in terms of (j)xY{t,s) by the two-dimensional Fourier transform 


fxYix,y) 




e-^d^+-'’y'>^XY{t,s)dtds; 


(4.87) 


and moments E{X"Y'’^} = if they exist, are related to 4>xY{t,s) by 




-4>XY{t,s) 


= J 


n+m 


XY{xx)dxdy 


t,s =0 


/ —OO J —OO 


(4.88) 


— J ^nm • 


The MacLaurin series expansion of (j)xY(t,s) thus takes the form 


d^xY{t,s) = 


(4.89) 


/=0 ^=0 


The above development can be generalized to the case of more than two 
random variables in an obvious manner. 

Example 4.18. Let us consider again the Brownian motion problem discussed 
in Example 4.17, and form two random variables X' and Y' as 


X' = Xi+X2 + ---+X2n, 1 

Y' = X„xi + Xnx2 + • • • + X^n- j 


(4.90) 
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They are, respectively, the position of the particle after 2n steps and its position 
after 3« steps relative to where it was after n steps. We wish to determine the 
joint probability density function fxrix,y) of random variables 

X' 

and 

7 = ^ 

«l/2 

for large values of n. 

For the simple case ofp = q — the characteristic function of each Xt is [see 
Equation (4.74)] 

(^(t) = + e-j') = cos t, (4.91) 

and, following Equation (4.83), the joint characteristic function of X and Y is 
(pxY{t,s) = £{exp[j(/Y + .s7)]} = ^jexp + | 

( \ \ / L k=\ k=n+\ k=2n+{ ] / ) 



(4.92) 

where (()(f) is given by Equation (4.91). The last expression in Equation (4.92) is 
obtained based on the fact that theX^, k = 1,2,..., 3«, are mutually independ¬ 
ent. It should be clear that X and Y are not independent, however. 

We are now in the position to obtain fxrO^X) from Equation (4.92) by using 
the inverse formula given by Equation (4.87). First, however, some simplifica¬ 
tions are in order. As n becomes large. 



= f\-— — 

\ n2! rP-4\ 



(4.93) 
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Hence, as « ^ oo, 

cl>XY{t,s) - 

Now, substituting Equation (4.94) into Equation (4.87) gives 

1 pOQ noo 

fxYi^,y) = ^2 / 

i/ —OO i/ —oo 

which can be evaluated following a change of variables defined hy 

f + 5' t' - y 

t = —^ 

Vi Vi 

The result is 


(4.94) 


(4.95) 


(4.96) 




— xy + y^ 


(4.97) 


The above is an example of a bivariate normal distribution, to be discussed in 
Section 7.2.3. 

Incidentally, the joint moments ofX and Y can be readily found by means of 
Equation (4.88). Eor large n, the means ofX and 7, aio and aoi, are 


aio = -J- 


V4>XY{t, s) 


dt 


= - s)& 


(r+?.v+.r 




= 0 , 


r,.5=0 


aoi = -J- 


V4>XY{t, s) 


ds 


= 0 . 


t,s=0 


Similarly, the second moments are 


a2o = E{XV = - 


^^4>XY{t,s) 




= 2 , 


t,s=0 


aQ2 = E{YV = - 


V(j)XY{t,s) 


ds^ 


= 2 , 


LS=0 


ail = E{XY} = - 


V(j>XY{t,s) 
dtds 


= 1 . 


?,.s'=0 
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FURTHER READING AND COMMENTS 

As mentioned in Section 4.2, the Chebyshev inequality can be improved upon if some 
additional distribution features of a random variable are known beyond its first two 
moments. Some generalizations can be found in: 

Mallows, C.L., 1956, ‘Generalizations of Tchebycheffs Inequalities’,/. Royal Statistical 
Societies, Series B 18 139-176. 

In many introductory texts, the discussion of characteristic functions of random 
variables is bypassed in favor of moment-generating functions. The moment-generating 
function Mx(t) of a random variable A is defined by 

Mx{t) = £{e'^}. 

In comparison with characteristic functions, the use of Mx{t) is simpler since it avoids 
computations involving complex numbers and it generates moments of A in a similar 
fashion. However, there are two disadvantages in using Mx{t). The first is that it 
may not exist for all values of t whereas 0^(0 always exists. In addition, powerful 
inversion formulae associated with characteristic functions no longer exist for moment¬ 
generating functions. For a discussion of the moment-generating function, see, for 
example: 

Meyer, P.L., 1970, Introductory Probability and Statistical Applications, 2nd edn, 
Addison-Wesley, Reading, Mas, pp. 210-217. 


PROBLEMS 

4.1 For each of the probability distribution functions (PDFs) given in Problem 3.1 
(Page 67), determine the mean and variance, if they exist, of its associated random 
variable. 

4.2 For each of the probability density functions (pdfs) given in Problem 3.4, determine 
the mean and variance, if they exist, of its associated random variable. 

4.3 According to the PDF given in Example 3.4 (page 47), determine the average 
duration of a long-distance telephone call. 

4.4 It is found that resistance of aircraft structural parts, R, in a nondimensionalized 
form, follows the distribution 


./«('•) 


_ 2 ( 7 |_ 

0.99967r[a^ + (r — 1)^]^ 
0, elsewhere; 


for r > 0.33; 


where gr = 0.0564. Determine the mean of R. 

4.5 A target is made of three concentric circles of radii 3“'^^, 1, and 3'^^ feet. Shots 
within the inner circle count 4 points, within the next ring 3 points, and within 
the third ring 2 points. Shots outside of the target count 0. Let R be the 
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random variable representing distance of the hit from the center. Suppose that the 
pdf of R is 


J r{^') 


2 

7r(l + r^) ’ 


for r > 0; 


0, elsewhere. 


Compute the mean score of each shot. 

4.6 A random variable A has the exponential distribution 


fxi^) 


ae ^1'^, for .v > 0; 
0, elsewhere. 


Determine: 

(a) The value of a. 

(b) The mean and variance of X. 

(c) The mean and variance of T = fA/2) — 1. 

4.7 Let the mean and variance of A be m and respectively. For what values of a and b 

does random variable Y, equal to aX + b, have mean 0 and variance 1? 

4.8 Suppose that your waiting time (in minutes) for a bus in the morning is uniformly 
distributed over (0, 5), whereas your waiting time in the evening is distributed as 
shown in Figure 4.4. These waiting times are assumed to be independent for any 
given day and from day to day. 

(a) If you take the bus each morning and evening for five days, what is the mean of 
your total waiting time? 

(b) What is the variance of your total five-day waiting time? 

(c) What are the mean and variance of the difference between morning and evening 
waiting times on a given day? 

(d) What are the mean and variance of the difference between total morning wait¬ 
ing time and total evening waiting time for five days? 


Mt) 



Figure 4.4 Density function of evening waiting times, for Problem 4.8 
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4.9 The diameter of an electronic cable, say X, is random, with pdf 


( 6x{ \ — x), for 0 < X < 1; 
\ 0, elsewhere. 


(a) What is the mean value of the diameter? 

(b) What is the mean value of the cross-sectional area, ('k/4)X^1 

4.10 Suppose that a random variable is distributed (arbitrarily) over the interval 

a < X < h. 


4.11 

4.12 


Show that: 

(a) mx is bounded by the same limits; 

, {b — af 

(b) 4 < 

Show that, given a random variable X, P(X = mx) = 1 if (7|, = 0. 


The waiting time T of a customer at an airline ticket counter can be characterized 
by a mixed distribution function (see Figure 4.5): 


Frit) 


j 0, for t < 0; 
\p+(l-;t)(l-e-^') 


for t > 0. 


Determine: 

(a) The average waiting time of an arrival, E{T}. 

(h) The average waiting time for an arrival given that a wait is required, 
E{T\T > 0}. 

4.13 For the commuter described in Problem 3.21 (page 72), assuming that he or she 
makes one of the trains, what is the average arrival time at the destination? 

4.14 A trapped miner has to choose one of two directions to find safety. If the miner 
goes to the right, then he will return to his original position after 3 minutes. If he 
goes to the left, he will with probability 1/3 reach safety and with probability 2/3 
return to his original position after 5 minutes of traveling. Assuming that he is at all 


Frit) 



Figure 4.5 Distribution function, Ff{t), of waiting times, for Problem 4.12 


TLFeBOOK 







Expectations and Moments 


115 


times equally likely to choose either direction, determine the average time interval 
(in minutes) that the miner will be trapped. 

4.15 Show that: 

(a) E{X\ Y = y} = E{X} if X and Y are independent. 

(b) E{XY\ Y = y}= yE{X\ Y = y}. 

(c) E{XY} = E{ YE[X\ F]}. 

4.16 Let random variable X be uniformly distributed over interval 0 < x < 2. Deter¬ 
mine a lower bound for P(\X — 1| < 0.75) using the Chebyshev inequality and 
compare it with the exact value of this probability. 

4.17 For random variable X defined in Problem 4.16, plot P(\X — mx\ < h) as a. func¬ 
tion of h and compare it with its lower bound as determined by the Chebyshev 
inequality. Show that the lower bound becomes a better approximation of 
P(\X-mx\ < h) as h becomes large. 

4.18 Let a random variable if take only nonnegative values; show that, for any a > 0, 


P{X >a)< 


mx 

a 


This is known as Markov's inequality. 

4.19 The yearly snowfall of a given region is a random variable with mean equal to 70 
inches. 

(a) What can be said about the probability that this year’s snowfall will be 
between 55 and 85 inches? 

(b) Can your answer be improved if, in addition, the standard deviation is known 
to be 10 inches? 


4.20 The number X of airplanes arriving at an airport during a given period of time is 
distributed according to 


Px(k) 


10o*e-'“ 

k\ 


/t = 0, 1,2,.... 


Use the Chebyshev inequality to determine a lower bound for probability 
P(80 < X < 120) during this period of time. 

4.21 For each joint distribution given in Problem 3.13 (page 71), determine mx, my, oj, 
aj., and pxy of random variables X and Y. 

4.22 In the circuit shown in Figure 4.6, the resistance R is random and uniformly 
distributed between 900 and llOOO. The current i = 0.01A and the resistance 
;'o = 1000 n are constants. 



Figure 4.6 Circuit diagram for Problem 4.22 
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(a) Determine my and (t|. of voltage V, which is given by 

V={R + r^)i. 

(b) Determine the correlation coefficient of i? and V. 

4.23 Let the jpdf of X and Y be given by 

. ( xy, for 0 < x < 1 , and 0 < y < 2 ; 

/xr(^,y)=[o^ and elsewhere. 

Determine the mean of Z, equal to (X^ + 

4.24 The product of two random variables X and Y occurs frequently in applied 
problems. Let Z = XY and assume that X and Y are independent. Determine the 
mean and variance of Z in terms of mx,mY, and a\. 

4.25 Let X = X\ + X 2 , and Y = X 2 + Z 3 . Determine correlation coefficient pxy of X 
and Fin terms of aXi,o-x^, and ax^ when Xi,X 2 , and Z 3 are uncorrelated. 

4.26 Let X and F be discrete random variables with joint probability mass function 
(jpmf) given by Table 4.1. Show that pxy = 0 but X and F are not independent. 


Table 4.1 Joint probability mass 
function, Pxy(x,y) for Problem 4.26 


y 


X 


-1 

0 

1 

-1 

a 

h 

a 

0 

h 

0 

h 

1 

a 

h 

a 


Note: a + ft = -. 

4 

4.27 In a simple frame structure such as the one shown in Figure 4.7, the total hor¬ 
izontal displacement of top storey F is the sum of the displacements of individual 
storeys Xi and Z 2 . Assume that Xi and X 2 are independent and let mx^,mx 2 , , 
and be their respective means and variances. 

(a) Find the mean and variance of F. 

(b) Find the correlation coefficient between X 2 and F. Discuss the result if 

4 ^ > 4 ,- 

4.28 Let Xi,...,X„ be a set of independent random variables, each of which has a 
probability density function (pdf) of the form 


/x4/) = - 


1 


( 2 .)'/^ 

Determine the mean and variance of F, where 


e y= 1,2, —oo<Xj<oo. 


./=1 
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Figure 4.7 Frame structure, for Problem 4.27 

4.29 Let Xi,X 2 , ■ ■ ■ ,X„ be independent random variables and let aj and /r, be the 
respective variance and third eentral moment of Xj. Let and /j. denote the 
corresponding quantities for Y, where Y = X\ + X 2 + ■ ■ ■ + X„. 

(a) Show that cr^ = af + + ■ ■ • + and /r = /i| + /i 2 + ■ ■ • + 

(b) Show that this additive property does not apply to the fourth-order or higher- 
order central moments. 

4.30 Determine the characteristic function corresponding to each of the PDFs given in 
Problem 3.1(a)-3.1(e) (page 67). Use it to generate the first two moments and 
compare them with results obtained in Problem 4.1. [Let a = 2 in part (e).] 

4.31 We have shown that characteristic function (j)x(t) of random variable W facilitates 

the determination of the moments oiX. Another function defined hy 

Mx{t) = 

and called the moment-generating function of X, can also be used to obtain 
moments of A. Derive the relationships between Mx(t) and the moments of A. 

4.32 Let 


Y — a\X\ X 02X2 ■ ■ ■ A rifiXfi 


where Ai,A 2 , ... ,X„ are mutually independent. Show that 
0r(O = (l>xM\^)4>X2{a2t) ■ ..(pxAaj). 
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Functions of Random Variables 


The basic topic to be discussed in this chapter is one of determining the relation¬ 
ship between probability distributions of two random variables X and Y when 
they are related by T = g(X). The functional form of g(X) is given and determin¬ 
istic. Generalizing to the case of many random variables, we are interested in the 
determination of the joint probability distribution of YjJ = 1,2,..., m, which is 
functionally dependent on Xk,k = 1,2,..., n, according to 

Yj = gj{Xi,...,X„) , j = m <n, (5.1) 

when the joint probabilistic behavior of Xk,k = 1, 2,..., n, is known. 

Some problems of this type (i.e. transformations of random variables) have 
been addressed in several places in Chapter 4. For example, Example 4.11 con¬ 
siders transformation Y — Xi + ■ ■ ■ + X„, and Example 4.18 deals with the trans¬ 
formation of 3n random variables (Xi,X 2 , ■ ■ ■ ,Xj,„) to two random variables 
(X°, Y®) defined by Equations (4.90). In science and engineering, most phenomena 
are based on functional relationships in which one or more dependent variables 
are expressed in terms of one or more independent variables. Eor example, force is 
a function of cross-sectional area and stress, distance traveled over a time interval 
is a function of the velocity, and so on. The techniques presented in this chapter 
thus permit us to determine the probabilistic behavior of random variables that 
are functionally dependent on some others with known probabilistic properties. 

In what follows, transformations of random variables are treated in a systemat¬ 
ic manner. In Equation (5.1), we are basically interested in the joint distributions 
and joint moments of Ti,..., y„, given appropriate information on Xi,... ,X„. 


5.1 FUNCTIONS OF ONE RANDOM VARIABLE 

Consider first a simple transformation involving only one random variable, and let 

Y = g(X) (5.2) 
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where g(X) is assumed to be a continuous function of X. Given the probability 
distribution of X in terms of its probability distribution function (PDF), 
probability mass function (pmf) or probability density function (pdf), we 
are interested in the corresponding distribution for Y and its moment 
properties. 


5.1.1 PROBABILITY DISTRIBUTION 

Given the probability distribution of X, the quantity Y, being a function of X as 
defined by Equation (5.2), is thus also a random variable. Let Rx be the range 
space associated with random variable X, defined as the set of all possible 
values assumed by X, and let Ry he the corresponding range space associated 
with Y. A basic procedure of determining the probability distribution of Y 
consists of the steps developed below. 

For any outcome such as X =x, it follows from Equation (5.2) that 
Y = y = g(x). As shown schematically in Figure 5.1, Equation (5.2) defines a 
mapping of values in range space Rx into corresponding values in range space 
Ry. Probabilities associated with each point (in the case of discrete random 
variable A) or with each region (in the case of continuous random variable A) in 
Rx are carried over to the corresponding point or region in y. The probability 
distribution of Y is determined on completing this transfer process for every 
point or every region of nonzero probability in Rx. Note that many-to-one 
transformations are possible, as also shown in Figure 5.1. The procedure of 
determining the probability distribution of Y is thus critically dependent on the 
functional form of g in Equation (5.2). 
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5.1.1.1 Discrete Random Variables 


Let us first dispose of the case when X is a discrete random variable, since it 
requires only simple point-to-point mapping. Suppose that the possible values 
taken by X can be enumerated as xi,X 2 , ■••• Equation (5.2) shows that the 
corresponding possible values of Y may be enumerated as yi = g(xi),y 2 = 
g(x 2 ), _Let the pmf of X be given by 

py{xi)=pi, 1=1,2,.... (5.3) 

The pmf of y is simply determined as 

PriYi) =PY[gixi)]=Pi, 1=1,2,.... (5.4) 


Example 5.1. Problem: the pmf of a random variable X is given as 




Px{x) = < 


i 

4’ 

i 

8 ’ 


for X = — 1; 


for X = 0; 
for X = 1 ; 


i, for X = 2; 


Determine the pmf of T if T is related to 2f by T = 2X + 1. 

Answer: the corresponding values of Y are: g(—1) = 2(—1)-F 1 = —1; 
g(0) = 1; g(l) = 3; and g(2) = 5. Hence, the pmf of Y is given by 


Pr(y) 


i, for>' = -l; 
I, for 7=1; 
i, forp = 3; 
i, forj=5. 


Example 5.2. Problem: for the same X as given in Example 5.1, determine the 
pmf of T if T = 2X^ + 1. 

Answer: in this case, the corresponding values of Y are: g(—1) = 2(—1)^-1- 
1=3; g(0) = 1; g(l) = 3; and g{2) = 9, resulting in 

for>-=l; 

PrM = I |(=5 + g). for >>=3; 

[i, forj = 9. 
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5.1.1.2 Continuous Random Variables 

A more frequently encountered case arises when X is continuous with known PDF, 
Fx(x), or pdf,/;f (x). To carry out the mapping steps as outlined at the beginning 
of this section, care must be exercised in choosing appropriate corresponding 
regions in range spaces and R y , this mapping being governed by the transform¬ 
ation Y — giX). Thus, the degree of complexity in determining the probability 
distribution of T is a function of complexity in the transformation g(V). 

Let us start by considering a simple relationship 


Y = g{X) = 2X+\. 


(5.5) 


The transformation y = g(x) is presented graphically in Figure 5.2. Consider 
the PDF of Y, Fyiy); it is defined by 


Fy{y) = P{Y<y). 


(5.6) 


The region defined by T < y in the range space Ry covers the heavier portion 
of the transformation curve, as shown in Figure 5.2, which, in the range space 
Rx, corresponds to the region g(X) < y, or X < g^*(y), where 



y 


y=2x+1 



Figure 5.2 Transformation defined by Equation (5.5) 
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is the inverse function of g(x), or the solution for x in Equation (5.5) in terms of 
y. Hence, 


Fr(y) = P(r <y) = m^) < j] = m < r‘(j)] = Fx[g-\y)\- (5.7) 


Equation (5.7) gives the relationship between the PDE of X and that of Y, our 
desired result. 

The relationship between the pdfs of Xand Y are obtained by differentiating 
both sides of Equation (5.7) with respect to y. We have: 


{Fx[g-\y)]} =fAg-\y)]^^^- (5.8) 



It is clear that Equations (5.7) and (5.8) hold not only for the particular 
transformation given by Equation (5.5) but for all continuous g(x) that are strictly 
monotonic increasing functions of x, that is, g(x 2 ) > g(xi) whenever X 2 > xi. 

Consider now a slightly different situation in which the transformation is 
given by 


Y = g{X) = -2X+l. 


(5.9) 


Starting again with Ej'(y)= P{Y < y), and reasoning as before, the region 
F < y in the range space Ry is now mapped into the region X > g^^(y), as 
indicated in Eigure 5.3. Hence, we have in this case 


Friy) = P{Y < y) = P[X > g-\y)] 

= i-P[X<g-\y)] = \-Fx[g-\y)]. 


(5.10) 


y 



y=-2x+1 


X 


Figure 5.3 Transformation defined by Equation (5.9) 
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In comparison with Equation (5.7), Equation (5.10) yields a different relation¬ 
ship between the PDEs of X and Y owing to a different g(X). 

The relationship between the pdfs of X and Y for this case is again obtained 
by differentiating both sides of Equation (5.10) with respect to y, giving 


friy) 


dFyjy) 

dy 




= -fx[g-\y)] 


dg ‘(t) 

dy 


(5.11) 


Again, we observe that Equations (5.10) and (5.11) hold for all continuous g{x) 
that are strictly monotonic decreasing functions of x, that is g{x 2 ) < g(xi) 
whenever X 2 > xi. 

Since the derivative dg '(y)/dy in Equation (5.8) is always positive - as g{x) is 
strictly monotonic increasing - and it is always negative in Equation (5.11) - as 
g(x) is strictly monotonic decreasing - the results expressed by these two 
equations can be combined to arrive at Theorem 5.1. 

Theorem 5.1. Let A be a continuous random variable and Y =g(X) where 
g(X) is continuous in X and strictly monotone. Then 


fr(y) =fxlg (j)] 


dg ‘(t) 


dg 



(5.12) 


where \u\ denotes the absolute value of u. 

Example 5.3. Problem: the pdf of X is given by (Cauchy distribution): 

+ <■») ’ -“<■>■<“■ ( 5 - 13 ) 

Determine the pdf of Y where 

Y=2X+\. (5.14) 

Answer: the transformation given by Equation (5.14) is strictly monotone. 
Equation (5.12) thus applies and we have 

g^\y) 

and 

dy 2' 
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Following Equation (5.12), the result is 


friy) =fx 



a 

27r 

2a 

TT 


4 + a2 
1 

(f- l)" + 4a2’ 


—oo < y < oo. 


(5.15) 


It is valid over the entire range —oo < j < (X) as it is in correspondence with the 
range —oo < x < oo defined in the range space Rx- 

Example 5.4. Problem: the angle $ of a pendulum as measured from the 
vertical is a random variable uniformly distributed over the interval 
(—7r/2 < $ < 7r/2). Determine the pdf of Y, the horizontal distance, as shown 
in Figure 5.4. 

Answer: the transformation equation in this case is 

7 = tan$, (5.16) 

where 

= (5.17) 

[ 0, elsewhere. 

As shown in Figure 5.5, Equation (5.16) is monotone within the range 
—7r/2 < (j) < 7r/2. Hence, Equation (5.12) again applies and we have 

g-\y) = tan^'y. 



- Y -► 

Figure 5.4 Pendulum, in Example 5.4 


TLFeBOOK 



















126 


Fundamentals of Probability and Statistics for Engineers 


y 



Figure 5.5 Transformation defined by Equation (5.16) 


and 


dg ‘ (t) ^ 1 

dy 1 + ■ 


The pdf of Y is thus given by 


friy) 


A (tan ‘t) 

1 +/ 


1 

7r(l +^2) ’ 


—(X) < y < oo. 


(5.18) 


The range space Ry corresponding to —7r/2 < cj) < njl is —oo < y < oo. The 
pdf given above is thus valid for the whole range of y. The random variable Y 
has the so-called Cauchy distribution and is plotted in Figure 5.6. 

Example 5.5. Problem: the resistance R in the circuit shown in Figure 5.7 is 
random and has a triangular distribution, as shown in Figure 5.8. With a 
constant current i = 0.1 A and a constant resistance ro = 10012; determine the 
pdf of voltage V. 

Answer: the relationship between V and R is 

F=/(i?-fro) =0.1(7?+100), (5.19) 
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fyiy) 



Figure 5.6 Probability density function,/j.(y) in Example 5.4 



fR(r) 
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and 


r0.005(;--90), for 90 </'< 110; 
10, elsewhere. 


(5.20) 


The range 90 < r < 110 corresponds to 19 <v <21 in the range space Ry ■ It is 
clear that/y(v) is zero outside the interval 19 < v < 21. In this interval, since 
Equation (5.19) represents a strictly monotonic function, we obtain by means 
of Equation (5.12), 


fv{^)=fR[g ‘(l')] 


dg ‘(v) 

dv 


19 < V < 21, 


where 


and 


We thus have 


g-i(v) = -100+10v, 

^= 10 . 

dv 

,/V(v) = 0.005(-100 + lOv - 90)(10) 
= 0.5(v — 19), for 19 < V < 21 


and 


/■y(v) = 0, elsewhere. 
The pdf of V is plotted in Eigure 5.9. 





TLFeBOOK 









Functions of Random Variables 


129 


y 



In the examples given above, it is easy to verify that all density functions 
obtained satisfy the required properties. 

Let us now turn our attention to a more general case where function 
Y = g(X) is not necessarily strictly monotonic. Two examples are given in 
Figures 5.10 and 5.11. In Figure 5.10, the monotonic property of the transform¬ 
ation holds for y < y\, and y > y^, and Equation (5.12) can be used to 
determine the pdf of Y in these intervals of y. For yi < y <yi, however, we 
must start from the beginning and consider FY{y)=P{Y < y). The region 
defined by T < y in the range space Ry covers the heavier portions of the 
function y = g(x), as shown in Figure 5.10. Thus: 

Pv(y) = F(T < y) = P[X < gr‘(y)] + Plg 2 '(y) <^< ft'(y)l 

= P[x < gr‘(y)] + P[x < ft-'(y)] - P[x < g 2 -'(y)] (5.21) 

= ■f’vlft'(y)l +-f’v'[ft‘(y)l - Px[g 2 \y)\-, yi<y< yi, 


where xi — gY^iy), ft =ft*(T); and X 3 = g^^iy) are roots for x of function 
y = g{x) in terms of y. 

As before, the relationship between the pdfs of X and Y is found by differ¬ 
entiating Equation (5.21) with respect to y. It is given by 


friy) =fx[gi ‘(y)l 


dgi ‘(y) 

dy 


+/x[ft‘(y)l 


dft *(>’) 

dy 


-/x[ft'(y)l 


dft‘(y) 

dy 


yi <y <y2- 

(5.22) 
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y 



X 


Figure 5.11 An example of nonmonotonic function y =g(x) 


Since derivative d^ 2 ^(y)/d 3 ' is negative whereas the others are positive, 
Equation (5.22) takes the convenient form 



(5.23) 


Figure 5.11 represents the transformation y = sin x; this equation has an infinite 
(but countable) number of roots, xi = .gr'Cv)’ ^2 = g 2 ^(y), ■ ■ ■ > for any F in the 
interval — 1 <y <1. Following the procedure outlined above, an equation similar 
to Equation (5.21) (but with an infinite number of terms) can be established for 
Fyiy) and, as seen from Equation (5.23), the pdf of Y now has the form 



(5.24) 


It is clear from Figure 5.11 that/),(y) = 0 elsewhere. 

A general pattern now emerges when function Y = giX) is nonmonotonic. 
Equations (5.23) and (5.24) lead to Theorem 5.2. 

Theorem 5.2: Let X hs a continuous random variable and Y = g{X), where 
g{X) is continuous in X, and y =g(x) admits at most a countable number of 
roots xi = gr^Cv), X 2 = g 2 ^(y), ■■■■ Then: 



(5.25) 
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where r is the number of roots for x of equation y = g(x). Clearly, Equation 
(5.12) is contained in this theorem as a special case (r = 1). 

Example 5.6. Problem: in Example 5.4, let random variable 4) now be uni¬ 
formly distributed over the interval —tt < $ < tt. Determine the pdf of 
Y = t an 4). 

Answer: the pdf of $ is now 




—, for — TT < (/) < tt; 
27r 

0, elsewhere; 


and the relevant portion of the transformation equation is plotted in 
Eigure 5.12. Eor each y, the two roots (pi and (p 2 of y = tancp are (see Figure 
5.12) 


<i^’i =.?!‘(P) = tan ‘j, for--<0i<O 
(p 2 = gp\y) y, for^<02<7r 




ii = tan y, for — tt < < — - 


02 = tan j, for 0 < 02 < 


y > 0. 


y 
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For all y. Equation (5.25) yields 

2 


friy) = 

7=1 


dj 


^gj ‘(j) 


1 / 1 




27r\^l+j^/ 27r\l+>’^ 

1 

——-oo<>’<oo, 

7r(l +y^) 


(5.26) 


a result identical to the solution for Example 5.4 [see Equation (5.18)]. 

Example 5.7. Problem: determine the pdf of Y =X'^ where X is normally 
distributed according to 


= 7 irEr 72 ^ -oo<x<oo. (5.27) 

(27r) 

As shown in Eigure 5.13, fY(y) = 0 for y< 0 since the transformation 
equation has no real roots in this range. Eor F > 0, the two roots of 
y = are 


^ 1,2 = guiiy) = 


y 
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Hence, using Equation (5.25), 


2 


friy) = ^fx[gj ‘(j)] 

./=i 


dgy ‘(f) 

dj 


2jl/2 ^ 2 jI/2 

=_i_ Q-yi^ 

(27rj)‘/^ 


or 


friy) 


_g for V ^ 0* 

/- xi/2^ ’ iorj^u, 

(27r>-) 

0, elsewhere. 


(5.28) 


This is the so-called distribution, to be discussed in more detail in Section 
7.4.2. 

Example 5.8. Problem: a random voltage V\ having a uniform distribution 
over interval 90 V < Vi < 110 V is put into a nonlinear device (a limiter), as 
shown in Figure 5.14. Determine the probability distribution of the output 
voltage V 2 - 

Answer: the relationship between V\ and V 2 is, as seen from Figure 5.14, 


F2 = g(Fi), 


(5.29) 


V 2 (volts) 
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where 

^(Fi)=0, Fi<95; 

Vi - 95 

95<Fi<105; 
g(Vi) = 1, Vi> 105. 

The theorems stated in this section do not apply in this case to the portions 
Vi < 95 V and vi > 105 V because infinite and noncountable number of roots 
for vi exist in these regions. However, we deduce immediately from Figure 5.14 
that 


P{V2 = 0)=P{Vi<95)=Fv,{95) 

/.95 ^ 

= / /H(vi)dvi =-; 

J90 ^ 

P{V2 =l) = P{Vi> 105) = 1 -Fk,(105) 
_ 1 
~4' 

For the middle portion, Equation (5.7) leads to 

Fy2(V2) = Fv,[g^\v2)\ 

= F (10v2 + 95), 0 < V2 < 1. 


Now, 

vi — 90 

F-f,(vi)=^^, 90<vi<110. 

We thus have 

F’f,(v2)=^(10v2 + 95 - 90)=^^^, 0<V2<1. 

The PDF, Fv 2 {v 2 ), is shown in Figure 5.15, an example of a mixed distribution. 


5.7.2 MOMENTS 

Having developed methods of determining the probability distribution of 
Y =g(X), it is a straightforward matter to calculate all the desired moments 
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3 


4 



4 


Figure 5.15 Distribution Fy^{v 2 ) in Example 5.8 


of Y if they exist. However, this procedure - the determination of moments of Y 
on finding the probability law of F - is cumbersome and unnecessary if only the 
moments of Y are of interest. 

A more expedient and direct way of finding the moments of F = giX), given 
the probability law of X, is to express moments of F as expectations of 
appropriate functions of A; they can then be evaluated directly within the 
probability domain of X. In fact, all the ‘machinery’ for proceeding along this 
line is contained in Equations (4.1) and (4.2). 

Let F — g{X) and assume that all desired moments of F exist. The nth 
moment of F can be expressed as 


E{Y''} = E{f{X)). 


(5.30) 


It follows from Equations (4.1) and (4.2) that, in terms of the pmf or pdf of X, 


E{F"} = E{f{X)} = ^g’'{xi)px{xi), X discrete; 


(5.31) 



X continuous. 


An alternative approach is to determine the characteristic function of F from 
which all moments of F can be generated through differentiation. As we see 
from the definition [Equations (4.46) and (4.47)], the characteristic function of 
F can be expressed by 



(5.32) 
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Upon evaluating (jyyit), the moments of Y are given by [Equation (4.52)]: 

i<{7«}=r>W(0), n=l,2,.... (5.33) 


Example 5.9. Problem: a random variable X is discrete and its pmf is given in 
Example 5.1. Determine the mean and variance of Y where Y =2X + 1. 
Answer: using the first of Equations (5.31), we obtain 


E{Y} = E{2X+ 1} = ^(2x, + l)/t;,(x,) 



3 ^ 

4’ 


E{Y^} = E{{2X+ 1)2} = ^(2x, + l)2;,^(x,) 



= 5; 


(5.34) 


(5.35) 


and 


a\ = E{Y^}-E^{Y} = 5-{^^ 


71 

16 ' 


(5.36) 


Following the second approach, let us use the method of characteristic func¬ 
tions described by Equations (5.32) and (5.33). The characteristic function of Y is 

i 

= l(4e-F-F2ej'-Fe2j'-Fe5jO, 

8 

and we have 

E{ Y] = (0) = j-' (-4 + 2 -F 3 + 5) = ^, 

E{ f 2} = (0) = i (4 + 2 + 9 + 25) = 5. 

O 
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As expected, these answers agree with the results obtained earlier [Equations 
(5.34) and (5.35)]. 

Let us again remark that the procedures described above do not require 
knowledge of/j,(y). One can determine/y(y) before moment calculations but it 
is less expedient when only moments of Y are desired. Another remark to be 
made is that, since the transformation is linear (Y = 2X + 1) in this case, only 
the first two moments of X are needed in finding the first two moments of Y, 
that is. 


E{Y} = E{2X+ 1} = 2E{X}+ 1, 

E{Y^} = E{{2X+ 1)^} = 4E{X^} + 4E{X} + 1, 

as seen from Equations (5.34) and (5.35). When the transformation is nonlinear, 
however, moments of X of different orders will be needed, as shown below. 

Example 5.10. Problem: from Example 5.7, determine the mean and variance 
of F =X^. The mean of Y is, in terms of/j^-(x), 

E{ Y} = E{X^} = -^ = 1, (5.37) 

(27r)'/2 7-oo 

and the second moment of Y is given by 

1 

E{ Y^} = £'{Y^} =-^ = 3. (5.38) 

(27r)‘/2 7_^ 


Thus, 


a]. = E{Y^}-E^{Y} = 3-l = 2. (5.39) 

In this case, complete knowledge off^(x) is not needed but we to need to 
know the second and fourth moments of X. 


5.2 FUNCTIONS OF TWO OR MORE RANDOM VARIABLES 

In this section, we extend earlier results to a more general case. The random 
variable Y is now a function of n jointly distributed random variables, 
Xi,X 2 , ■ ■ ■ ,Xn. Formulae will be developed for the corresponding distribution 
for Y. 

As in the single random variable case, the case in which X\,X 2 ,..., and A„ 
are discrete random variables presents no problem and we will demonstrate this 
by way of an example (Example 5.13). Our basic interest here lies in the 
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determination of the distribution Y when aWXj,] = 1,2, ...,n, are continuous 
random variables. Consider the transformation 

Y = g{X,,...,Xn) (5.40) 

where the joint distribution of Xi,X 2 ,..., and X„ is assumed to be specified in 
term of their joint probability density function (jpdf),(xi,... ,x„), or 
their joint probability distribution function (JPDF), ■ ■ ■ ,Xn)- In a 

more compact notation, they can be written asfxix) and Fxix), respectively, 
where X is an n-dimensional random vector with components Xi,X 2 ,... ,2f„. 

The starting point of the derivation is the same as in the single-random- 
variable case; that is, we consider F’y(y) = P(F < y). In terms of X, this 
probability is equal to P\g(X) < y]. Thus: 

FY{y)=P{Y<y) = P[g{X)<y] 

= Fx[x:g{x) <y]. 

The final expression in the above represents the JPDF of X for which the 
argument x satisfies g(x) < y. In terms offxix), it is given by 


:g(-r) <y]= J ■■■J fx{x)dx 


(5.42) 


{R'‘:g{x)<y) 


where the limits of the integrals are determined by an n-dimensional region R" 
within which g(x) < y is satisfied. In view of Equations (5.41) and (5.42), the 
PDF of Y, Fyiy), can be determined by evaluating the n-dimensional integral in 
Equation (5.42). The crucial step in this derivation is clearly the identification 
of R", which must be carried out on a problem-to-problem basis. As n becomes 
large, this can present a formidable obstacle. 

The procedure outlined above can be best demonstrated through examples. 

Example 5.11. Problem: let Y = X 1 X 2 . Determine the pdf of Y in terms of 
fxa2^Xi,X2). 

Answer: from Equations (5.41) and (5.42), we have 

FY{y)= J J fx,x2i^uX2)dxidx2. (5.43) 

{R'^:x,X2<y) 


The equation xiX 2 = y is graphed in Eigure 5.16 in which the shaded area 
represents R^, or xiX 2 <y. The limits of the double integral can thus be 
determined and Equation (5.43) becomes 
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Figure 5.16 Region in Example 5.11 


roo ryj^i /“O /*oo 

{y)= / fx,X2ixuX2)dxidx2+ / f^^^^{xx,x2)dxidx2. (5.44) 

JQ J—oo J—oo Jy/x2 


Substituting.X 2 ) into Equation (5.44) enables us to determine Fyiy) 
and, on differentiating with respect to y, gives/j.(y). 

For the special case where Xi and X 2 are independent, we have 
= fx,i^i)fx 2 i^ 2 ), and Equation (5.44) simplifies to 


Friy) 



(ii)/,.(«)d«+£ 


-Fx, 



fx^{x2)dX2, 


and 


friy) 


dFyjy) 

dy 



dX2. 


(5.45) 
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As a numerical example, suppose that Xi and X 2 are independent and 




( 2xi, for 0 < xi < 1; 

\ 0, elsewhere; 

I > for 0 < X 2 < 2; 
[ 0, elsewhere. 


The pdf of Y is, following Equation (5.45) 

1 




- Lj"' 

Jy \X2j \ 

= 0, elsewhere. 


dX2 


= X "Ujl —for0<F<2; 


(5.46) 


In the above, the integration limits are determined from the fact thatyj(xi) 
and fx^ixi) are nonzero in intervals 0 <xi < 1, and 0 < X2 < 2. With the 
argument offx^ixl) replaced hy y/x 2 in the integral, we have 0 <y/x 2 < 1, 
and 0 < X 2 < 2, which are equivalent to y < X 2 < 2. Also, range 0 < y < 2 for 
the nonzero portion of/y(y) is determined from the fact that, since y =xiX 2 , 
intervals 0 < xi < 1, and 0 < X 2 < 2 directly give 0 < y < 2. 

Finally, Equation (5.46) gives 


r 2 + y(lny — 1 — ln2), for 0 < y < 2; 
(0, elsewhere. 


(5.47) 


This is shown graphically in Figure 5.17. It is an easy exercise to show that 


[ /y(T)dy = 1. 
Jo 


Example 5.12. Problem: let Y = X 1 /X 2 where Aj and A 2 are independent and 
identically distributed according to 


fx,ix^) 


e for xi > 0; 
0, elsewhere; 


(5.48) 


and similarly for A 2 . Determine/^ (y). 

Answer: it follows from Equations (5.41) and (5.42) that 


fv{y) 


j J f X,X2i^U X2)dxidX2. 


(R^:xi/x2<y) 
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Wiy) 



Figure 5.17 Probability density function,/j.(y), in Example 5.11 


X2 



The region for positive values of xi and X 2 is shown as the shaded area in 
Figure 5.18. Flence, 


Friy) 



0 , 


xiy 


fx,X2i^U^2)dxidX2, 


JO 

elsewhere. 


for y > 0; 
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For independent Xi and X 2 , 


POO pxiy 

pY{y)= / fx,{xi)fj^^{x2)dxidx2; 

Jo Jo 

Fx, {X 2 y)fx 2 {X 2 )dx 2 , for p > 0; 


(5.49) 


do 

= 0, elsewhere. 

The pdf of Y is thus given by, on differentiating Equation (5.49) with respect 
to y. 


friy) = { Jo 


noo 

/ X2f X, {X 2 y)fx 2 {X2)dx2, for J > 0; 

do 


(5.50) 


0, elsewhere; 


and, on substituting Equation (5.48) into Equation (5.50), it takes the form 


friy) 


X 2 Q 


2(1X9 = • 


(1+t) 


2 ’ 


for j > 0; 


0, elsewhere. 


(5.51) 


Again, it is easy to check that 

poo 

/ ./'rWdT = 1. 
do 

Example 5.13. To show that it is elementary to obtain solutions to problems 
discussed in this section when Xi,X 2 ,..., and X„ are discrete, consider again 
Y = X 1 /X 2 given that Xi and X 2 are discrete and their joint probability mass 
function (jpmf)tabulated in Table 5.1. In this case, the pmf of Y 
is easily determined by assignment of probabilities PxiX 2 ^^i’^ 2 ) to the corres¬ 
ponding values of y = xi/x 2 - Thus, we obtain: 


Priy) 


0.5, fory = i; 
0.24 + 0.04 = 0.28, 
0.04, fory = ^; 
0.06, for y = 2] 
,0.12, fory = 3. 


for y = 1; 
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Table 5.1 Joint probability mass 
function, in Example 5.13 


X2 


Xi 



1 

2 

3 

1 

0.04 

0.06 

0.12 

2 

0.5 

0.24 

0.04 


Example 5.14. Problem: in structural reliability studies, the probability of 
failure q is defined by 


q = P{R<S), 


where R and S represent, respectively, structural resistance and applied force. 
Let R and S he independent random variables taking only positive values. 
Determine q in terms of the probability distributions associated with R and S. 
Answer: let Y = R/S. Probability q can be expressed by 

< 0 =P{Y<\) = Fy{1). 



Identifying R and S with Xi and X 2 , respectively, in Example 5.12, it follows 
from Equation (5.49) that 

poo 

q = FY{l)= Fji{s)fs{s)ds. 

JO 


Example 5.15. Problem: determine FY(y) in terms of when 

Y = min (Ai, A 2 ). 

Answer: now. 


Friy) 



f 


X,X2 


{xi,X2)dxidX2, 


{R^ : min(Ai,X2)<>^) 


where region R^ is shown in Eigure 5.19. Thus 

/ }• roo j-ca ry 

/ /v,Z2(-^b-^2)dxidX2 + / / f x,X2iXhX2)dxidX2 

■00 J—00 Jy J—00 

/ y poo poo py 

/ f X,X2i.Xl,X2)dXidX2 + / / f x^X2i.Xl,X2)dxidX2 

■00 J —00 J —00 J —00 

/ y ry 

/ /jk',X2(-^l.-^2)dxidX2 

•00 J —00 


= Fx2iy) + Fx, (j) - Fx,x2{y,y), 
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X2 



which is the desired solution. If random variables Xi and X 2 are independent, 
straightforward differentiation shows that 

friy) =/x,(>’)[i - Fx2{y)] +/i'2(j)[i - Fx,{y)]- 

Let us note here that the results given above can be obtained following a 
different, and more direct, procedure. Note that the event [min {Xi,X 2 ) < y] is 
equivalent to the event {Xi < y UX 2 < y). Hence, 

Friy) = P{Y<y)= P[min(Jri, JL2) < y] 

= P{Xi<yUX2<y). 


Since 

P{A UB)= P{A) + P{B) - P{AB), 

we have 


Fviy) = P{X, <y) + P{X 2 <y)- P{Xx <yf^X 2 <y) 

= Fx,{y) + Fx^iy) - Fx.x^iy.y)- 

If Xi and X 2 are independent, we have 

FY{y)= Fx, (y) + Fx^iy) - Fx, {y)Fx2{y), 

and 

friy) = =fx,m - Fx^iy)] +fx2m - Fx,{y)]. 
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We have not given examples in which functions of more than two random 
variables are involved. Although more complicated problems can be formu¬ 
lated in a similar fashion, it is in general more difficult to identify appropriate 
regions f?" required by Equation (5.42), and the integrals are, of course, more 
difficult to carry out. In principle, however, no intrinsic difficulties present 
themselves in cases of functions of more than two random variables. 


5.2.7 SUMS OF RANDOM VARIABLES 

One of the most important transformations we encounter is a sum of random 
variables. It has been discussed in Chapter 4 in the context of characteristic 
functions. In fact, the technique of characteristic functions remains to be the 
most powerful technique for sums of independent random variables. 

In this section, the procedure presented in the above is used to give an 
alternate method of attack. 

Consider the sum 

Y = g{Xu. ..,X„)=Xi+X2 + --- + Xn. (5.52) 

It suffices to determine/j,(y) for n = 2. The result for this case can then be 
applied successively to give the probability distribution of a sum of any number 
of random variables. For T =Xi +X 2 , Equations (5.41) and (5.42) give 

pY{y)= JJ f x,X2i^U X2)dxidX2, 

:xi+X2<y) 


and, as seen from Figure 5.20, 

/ oo ry-X2 

/ fx,X2i^UX2)dxidX2. 

-OO j —CO 

Upon differentiating with respect to y we obtain 

POO 

fYiy)= fx,X2iy-X2,X2)dX2. 


(5.53) 


(5.54) 


When Xi and X 2 are independent, the above result further reduces to 

/ OO 

fx,{y-X2)fx2ix2)dx2. (5.55) 

-OO 


Integrals of the form given above arise often in practice. It is called convolution 
of the functions/j^-j(xi) and/j^-j(x 2 ). 
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Considerable importance is attached to the results expressed by Equations 
(5.54) and (5.55) because sums of random variables occur frequently in prac¬ 
tical situations. By way of recognizing this fact, Equation (5.55) is repeated now 
as Theorem 5.3. 

Theorem 5.3. Let Y =Xi +X 2 , and let Xi and X 2 be independent and con¬ 
tinuous random variables. Then the pdf of Y is the convolution of the pdfs 
associated with Xi and X 2 ', that is, 


/ OO POO 

fxiiy-X2)fx2i^2)dx2= / xi)fy^{xi)dxi. 

■OO J —OO 


(5.56) 


Repeated applications of this formula determine/y(y) when T is a sum of 
any number of independent random variables. 

Example5.16. Problem: determine/y(y) of F =Xi -FX 2 whenXi andX 2 are 
independent and identically distributed according to 


fx, (^1) 


ae , for xi > 0; 
0, elsewhere; 


(5.57) 


and similarly for X 2 . 

Answer: Equation (5.56) in this case leads to 

fyiy) = a- [' e-“(^’-^^)e-“Mx 2 , y>0, (5.58) 

Jo 
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where the integration limits are determined from the requirements y — X 2 > 0, 
and X 2 > 0. The result is 


fviy) 


a^ye for j > 0; 
0, elsewhere. 


(5.59) 


Let us note that this problem has also been solved in Example 4.16, by means of 
characteristic functions. It is to be stressed again that the method of character¬ 
istic functions is another powerful technique for dealing with sums of independ¬ 
ent random variables. In fact, when the number of random variables involved 
in a sum is large, the method of characteristic function is preferred since there is 
no need to consider only two at a time as required by Equation (5.56). 


Example 5.17. Problem: the random variables Xi and X 2 are independent 
and uniformly distributed in intervals 0 <xi <1, and 0 <X 2 <2. Determine 
the pdf of F =Xi -\-X 2 . 

Answer: the convolution of = 1, 0 <xi <1, and /xj(.V 2 ) = 1/2, 

0 < X 2 ^2, results in 


friy) 


fx,(.y - X 2 )fx 2 (.X 2 )dX 2 ; 



0, elsewhere. 


for 0 < j < 1; 
for 1 < j < 2; 

—, for 2 < j < 3; 


In the above, the limits of the integrals are determined from the requirements 
0 <y —X 2 <1, and 0 <X 2 <2. The shape offyiy) is that of a trapezoid, as 
shown in Figure 5.21. 


5.3 m FUNCTIONS OF n RANDOM VARIABLES 

We now consider the general transformation given by Equation (5.1), that is, 

j=l,2,...,m, m<n. (5.60) 
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fviy) 


2 





y 


0 


2 3 


Figure 5.21 Probability density function,/j.(y), in Example 5.17 


The problem is to obtain the joint probability distribution of random variables 
Yj, j = 1,2, ...,m, which arise as functions of n jointly distributed random 
variables X^, k = I,... ,n. As before, we are primarily concerned with the case 
in which Xi,... ,X„ are continuous random variables. 

In order to develop pertinent formulae, the case of m = n is first considered. 
We will see that the results obtained for this case encompass situations in which 
m < n. 

Let X and Y be two n-dimensional random vectors with components 
{Xi,...,X„) and {Yx,...,Y„), respectively. A vector equation representing 
Equation (5.60) is 


y=g{x) 


(5.61) 


where vector g(X) has as components giCAT), g 2 {X),... gn{X). We first consider 
the case in which functions gj in g are continuous with respect to each of their 
arguments, have continuous partial derivatives, and define one-to-one 
mappings. It then follows that inverse functions gj^^ ofg^', defined by 


X^g-\Y) 


(5.62) 


exist and are unique. They also have continuous partial derivatives. 

In order to determine/y(j) in terms off^(x), we observe that, if a closed 
region R'^ in the range space of X is mapped into a closed region Ry in the 
range space of Y under transformation g, the conservation of probability gives 



(5.63) 


R' 


Y 


R' 


X 
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where the integrals represent n-fold integrals with respect to the components of 
X andj, respectively. Following the usual rule of change of variables in multiple 
integrals, we can write (for example, see Courant, 1937): 

J ■■■ y./x(x)dx = j ■■■ j /x[^^‘(y)]l-^|dy, (5.64) 

where J is the Jacobian of the transformation, defined as the determinant 


0^1 ‘ 

0^1 ‘ 

0^1 ' 

dyi 

0T2 

dy„ 

0^;‘ 

0^;‘ 

^gn' 

dyi 

0T2 

dy„ 


As a point of clarification, let us note that the vertical lines in Equation (5.65) 
denote determinant and those in Equation (5.64) represent absolute value. 

Equations (5.63) and (5.64) then lead to the desired formula: 

./Y(y) =/x[^^‘(y)]l-^l- (5-66) 

This result is stated as Theorem 5.4. 

Theorem 5.4. For the transformation given by Equation (5.61) where X is a 
continuous random vector and g is continuous with continuous partial deriva¬ 
tives and defines a one-to-one mapping, the jpdf of Y,/Y(y), is given by 


./Y(y)=/x[^ ‘(y)]!-^!, 


(5.67) 


where J is defined by Equation (5.65). 

It is of interest to note that Equation (5.67) is an extension of Equation 
(5.12), which is for the special case of n = 1. Similarly, an extension is also 
possible of Equation (5.24) for the n = 1 case when the transformation admits 
more than one root. Reasoning as we have done in deriving Equation (5.24), we 
have Theorem 5.5. 


Theorem 5.5. In Theorem 5.4, suppose transformation y = g(x) admits at 
most a countable number of roots xi *(y),X 2 =g '2 *(y)>_Then 


/Y(y) = '^fxigj ‘(y)]l-^2i 

,/=i 


(5.68) 


TLFeBOOK 





150 


Fundamentals of Probability and Statistics for Engineers 


where r is the number of solutions for x of equation j = gix), and Jj is 
defined by 



0^21‘ 

0^21‘ 

dyi 

0J2 

dy„ 

^S7n 


0?2^‘ 

dyi 

0J2 

5yn 


(5.69) 


In the above, gj\,gj 2 , ■ ■ ■, and gjn are components of gj. 

As we mentioned earlier, the results presented above can also be applied to 
the case in which the dimension of Y is smaller than that of X. Consider the 
transformation represented in Equation (5.60) in which m < n. In order to use 
the formulae developed above, we first augment the m-dimensional random 
vector Y by another (n — m) - dimensional random vector Z. The vector Z can 
be constructed as a simple function of X in the form 


Z^h{X), (5.70) 

where h satisfies conditions of continuity and continuity in partial derivatives. 
On combining Equations (5.60) and (5.70), we have now an n-random-variable 
to n-random-variable transformation, and the jpdf of Y and Z can be obtained 
by means of Equation (5.67) or Equation (5.68). The jpdf of Y alone is then 
found through integration with respect to the components of Z. 

Example 5.18. Problem: let random variables Xi and X 2 be independent and 
identically and normally distributed according to 

fxS^l) = -00<Xi<00, 


and similarly for X 2 . Determine the jpdf of T 1 = Xi + X2, and Y 2=X\— X 2 . 

Answer: Equation (5.67) applies in this case. The solutions of xi and X 2 in 
terms of yi and y 2 are 

-1/ ^ Tl +T2 -u , Jl - J2 .. 

X\=gx {y)= -^-, X 2 =g 2 (j) = -^-. (5.71) 

The Jacobian in this case takes the form 


0^1 ‘ 

0^1 ' 


1 1 

0Jl 

0J2 


2 2 

0^2 ‘ 

0^2' 


1 1 

0Jl 

0T2 


2 ~2 
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Hence, Equation (5.67) leads to 

fy, 72 (jb F 2 ) = fx, fer ‘ iy)¥x2 [si ¥yW\ 






-{yi+yif 


-{y\ -yif 


-{yi + ylV 


exp 


, (- 00 ,- 00 ) < < ( 00 , 00 ). (5.72) 


It is of interest to note that the result given hy Equation (5.72) can be written as 
fy.Y^iyuyi) =/7,(ji)/y2(>’2), (5.73) 

where 


/7,(Fi) = -oo<ji<oo, 

/b ( 3 ^ 2 ) = 1/2 exp ^ -00 <>>2 < 00 , 

implying that, although Yi and F 2 are both functions of X\ and X 2 , they are 
independent and identically and normally distributed. 

Example 5.19. Problem: for the same distributions assigned to Xi and X 2 in 
Example 5.18, determine the jpdf of Yi = (X^ + X^Y'^ and Y 2 = Xi/X 2 . 

Answer: let us first note that Y 1 takes values only in the positive range. 
Hence, 


./’F,y2(>’bJ2) = 0, ji < 0. 

For yi > 0, the transformation y = g(x) admits two solutions. They are: 

2^11 = W(y) = 


yxYi 
2',1/2 ^ 


(1 +T 2 ) 


2^12 = ^12 (y) = 


Jl 




and 


X 2 \ =^2l‘(y) = - 2 Cll, 

21:22 = .? 22 ‘(y) = -2:12 ■ 
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Equation (5.68) now applies and we have 

fY,Y2iyuy2) =/x[^r‘(y)]l>^il+/x[^2‘(y)]l-^2|, (5.74) 

where 

fxix) =/z,(^i)/z 2(^2) = ^exp , (5.75) 

8gn 8gn 

0Jl 0J2 

0gr2 0gr2 

0Jl 0J2 



Jx=J2 = 



We note that the result can again be expressed as the product of/yj(yi) and 
fY 2 ^ 2 ), with 


/r,(Fi) 

fY2{y2) 



Thus random variables Y \ and T 2 are again independent in this case where Y 1 
has the so-called Raleigh distribution and ^2 is Cauchy distributed. We remark 
that the factor (I/tt) is assigned to fy^iyi) to make the area under each pdf 
equal to 1. 

Example 5.20. Let us determine the pdf of Y considered in Example 5.11 by 
using the formulae developed in this section. The transformation is 

Y = XyX2. (5.78) 
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In order to conform with conditions stated in this section, we augment Equa¬ 
tion (5.78) by some simple transformation such as 

Z = JE2. (5.79) 

The random variables Y and Z now play the role of Y i and Y 2 in Equation 
(5.67) and we have 

fYziy,^) =f x,x2[s^\y^z).S2\y.z)\\Jl (5.80) 


where 

gi\y,z) = ^, 
g2\y,z) = z, 

1 

Z ■ 




Z 

^2 


Using specific forms and/_y^(x 2 ) given in Example (5.11), Equation 

(5.80) becomes 


f Yziy^^) — fxi / Xi (^) 

y(2 - z) 


2 -; 

2z 


for 0 < F < 2, and y < z < 2; 


(5.81) 


= 0, elsewhere. 


Einally, pdf/^ly) is found by performing integration of Equation (5.81) with 
respect to z: 


fY(y) 


fYz(y,^) dz 



>(2 - z)' 

ly 

2^ 


dz; 


= 2+y(lny — 1 — ln2), for 0 < y < 2; 
= 0, elsewhere. 


This result agrees with that given in Equation (5.47) in Example 5.11. 


REFERENCE 

Courant, R., 1937, Dijferential and Integral Calculus, Volume II, Wiley-Interscience, 
New York. 
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PROBLEMS 

5.1 Determine the Probability distribution function (PDF) of K = 3X — 1 if 
(a) Case 1: 



0, for .X < 3; 

for 3 < X < 6; 
1, for X > 6. 


(b) Case 2: 



5.2 Temperature C measured in degrees Celsius is related to temperature X in degrees 
Fahrenheit by C = 5(2f — 32)/9. Determine the probability density function (pdf) of 
C if 2f is random and is distributed uniformly in the interval (86, 95). 

5.3 The random variable X has a triangular distribution as shown in Figure 5.22. 
Determine the pdf of F =3X +2. 

5.4 Determine Fyiy) in terms of Fx(x) if Y =X^^'^, where Fx(x) =0, x < 0. 

5.5 A random variable Y has a ‘log-normal’ distribution if it is related to A by F = e^, 
where X is normally distributed according to 



— 00 < X < 00 


Determine the pdf of F for m = 0 and a = 1 


fx(x) 



X 


-1 


Figure 5.22 Distribution of X, for Problem 5.3 


TLFeBOOK 








Functions of Random Variables 


155 


5.6 The following arises in the study of earthquake-resistant design. If X is the magni¬ 
tude of an earthquake and Y is ground-motion intensity at distance c from the 
earthquake, X and Y may be related by 

Y=ce^. 


If X has the distribution 


J Ae for .V > 0; 
10, elsewhere: 


(a) Show that the PDF of Y, Fyiy), is 


Friy) = 



for y>c\ 
c. 


(b) Whatis/j.(y)? 

5.7 The risk R of an accident for a vehicle traveling at a ‘constant’ speed V is given by 


where a, b, and c are positive constants. Suppose that speed V of a class of vehicles is 
random and is uniformly distributed between vi and V 2 . Determine the pdf of R if (a) 
(vi, V 2 ) > c, and (b) vi and V 2 are such that c = (vj + V 2 )/ 2 . 

5.8 Let Y = g(X), with X uniformly distributed over the interval a < x < b. Suppose 
that the inverse function X = g^*(T) is a single-valued function of Y in the interval 
g(o) < A < gib)- Show that the pdf of Y is 


friy) 


r 1 

1 

{ h-a 


1,0, elsewhere. 


for g{a) <y<g{h); 


where g'(x) = dg(x)/dx. 

5.9 A rectangular plate of area a is situated in a flow stream at an angle 0 measured from 
the streamline, as shown in Figure 5.23. Assuming that 0 is uniformly distributed 
from 0 to tt/2, determine the pdf of the projected area perpendicular to the stream. 
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5.10 At a given location, the PDF of annual wind speed, V, in miles per hour is found 
to be 


1 

r 

/ y \-6.96 

Fv{v) = \ 

1 exp 

“V3^/ 

1 

lo, 

elsewhere. 


The wind force W exerted on structures is proportional to V^. Let W = aW'^. 

(a) Determine the pdf of VT and its mean and variance by using /iv(w). 

(b) Determine the mean and variance of IT directly from the knowledge of Fy{v). 

5.11 An electrical device called a full-wave rectifier transforms input X to the device, to 
output Y according to F = |A |. If input X has a pdf of the form 


fx(x) 


(27 


(172 exp 


-{x- 1)^ 


— 00 < X < 00, 


(a) Determine the pdf of Y and its mean and variance using fyiy)- 

(b) Determine the mean and variance of Y directly from the knowledge off^(x). 

5.12 An electrical device gives output Y in terms of input X according to 


y = gi^) 


1, for Z > 0; 
0, for Z < 0. 


Is random variable Y continuous or discrete?Determine its probability distribution 
in terms of the pdf of X. 

5.13 The kinetic energy of a particle with mass m and velocity V is given by 



Suppose that m is deterministic 


./'f(v) = 


and V is random with pdf given by 

for V > 0; 

0, elsewhere. 


Determine the pdf of X. 

5.14 The radius i? of a sphere is known to be distributed uniformly in the range 
0.99ro <r < 1.01 ro. Determine the pdfs of (a) its surface area and (b) of its 
volume. 

5.15 A resistor to be used as a component in a simple electrical circuit is randomly 
chosen from a stock for which resistance R has the pdf 


( a^re for r > 0; 
i 0, elsewhere. 
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Suppose that voltage source v in the circuit is a deterministic constant. 

(a) Find the pdf of current f where I = v/R, passing through the circuit. 

(b) Find the pdf of power W, where W = I^R, dissipated in the resistor. 

5.16 The independent random variables Xi and X 2 are uniformly and identically 
distributed, with pdfs 

, . I for -1 < X| < 1 ; 

[ 0 , elsewhere; 

and similarly for V 2 . Let Y =Xi +X2- 

(a) Determine the pdf of Y by using Equation (5.56). 

(b) Determine the pdf of Y by using the method of characteristic functions devel¬ 
oped in Section 4.5. 

5.17 Two random variables, Ti and T 2 , are independent and exponentially distributed 
according to 


f Ti (L) — 

f — 


2 e , for t\ > 0 ; 
0 , elsewhere; 
2 e- 2 ' 2 ^ for t 2 > 0 ; 
0 , elsewhere. 


(a) Determine the pdf of T =T^ — T 2 - 

(b) Determine mj- and 

5.18 A discrete random variable V has a binomial distribution with parameters (n,p). Its 
probability mass function (pmf) has the form 

FvW = Q/(l k = 0, 1 , 2 ,..., n. 

Show that, if X-i and X2 are independent and have binomial distributions with 
parameters (nj, p) and (112, p), respectively, the sum Y =Xi +X2 has a binomial 
distribution with parameters (ni +112, p). 

5.19 Consider the sum of two independent random variables Xi and X2 where X\ is 
discrete, taking values a and b with probabilities P{Xi = a) = p, and P(Xi =b) = 
q (p + q = Y), and X2 is continuous with pdf fx^ix2)- 

(a) Show that Y =Xi +A 2 is a continuous random variable with pdf 

f riy) = pf ySy) + cif Y^iy), 

where/j.j(y) and/j.^(y) are, respectively, the pdfs of Ki = a +X 2 , and Y 2 = b +X 2 
at y. 

(b) Plot/y(y) by letting fl = 0, = 1, p = 5 ,^ = |, and 

fx2 (■^2) = ’ -00 <X2< 00 
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Figure 5.24 Parallel arrangement of components A and B, for Problem 5.20 

5.20 Consider a system with a parallel arrangement, as shown in Figure 5.24, and let A 
be the primary component and B its redundant mate (backup component). The 
operating lives of A and B are denoted by Ti and T 2 , respectively, and they follow 
the exponential distributions 


/r, (0) 
/t’2(^2) 


aie , for t\ > 0; 
0 , elsewhere; 
026 ^“^'^ for ?2 > 0; 
0 , elsewhere. 


Let the life of the system be denoted by T. Then T = Ti + T 2 if the redundant part 
comes into operation only when the primary component fails (so-called ‘cold 
redundancy’) and T = max(7’i, T 2 ) if the redundant part is kept in a ready condi¬ 
tion at all times so that delay is minimized in the event of changeover from the 
primary component to its redundant mate (so-called ‘hot redundancy’). 

(a) Let Tc = Ti + T 2 , and Th = max(ri,7’2). Determine their respective prob¬ 
ability density functions. 

(b) Suppose that we wish to maximize the probability P{T > t) for some t. Which 
type of redundancy is preferred? 

5.21 Consider a system with components arranged in series, as shown in Figure 5.25, 
and let Ti and 7’2 be independent random variables, representing the operating 
lives of Aand B, for which the pdfs are given in Problem 5.20. Determine the pdf of 
system life T = min {T\, 7’2). Generalize to the case of n components in series. 

5.22 At a taxi stand, the number Xi of taxis arriving during some time interval has a 
Poisson distribution with pmf given by 


Pxt (k) 


k\ 


/t = 0,1,2,..., 



Figure 5.25 Components A and B arranged in series, for Problem 5.21 
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where A is a constant. Suppose that demand X 2 at this location during the same 
time interval has the same distribution as Xi and is independent of Xj. Determine 
the pdf of y = X 2 — XI where Y represents the excess of taxis in this time interval 
(positive and negative). 


5.23 Determine the pdf of K = \Xi — A 2 I where Xi and X 2 are independent random 
variables with respective pdfs/jj^lxi) and/ji.^(x 2 ). 


5.24 The light intensity / at a given point X distance away from a light source is 
/ = C/X^ where C is the source candlepower. Determine the pdf of I if the pdfs 
of C and X are given by 


/c(c) = < 36’ 

C, ^ 

lo, 

elsewhere; 


for 1 < X < 2 ; 

fx{x) = < 


lo, 

elsewhere; 


and C and X are independent. 

5.25 Let Xi and X 2 be independent and identically distributed according to 


fxAxi) 


(27 


V/2 


exp 



—00 < xi < 00, 


and similarly for X 2 . By means of techniques developed in Section 5.2, determine 
the pdf of y, where Y = (Xf + Check your answer with the result obtained 

in Example 5.19. (Hint: use polar coordinates to carry out integration.) 

5.26 Extend the result of Problem 5.25 to the case of three independent and identically 
distributed random variables, that is, Y = (X^ + X| + X|) (Hint: use spherical 
coordinates to carry out integration.) 

5.27 The joint probability density function (jpdf) of random variables Xi,X 2 , and X 3 
takes the form 


.fxiX2Xi{xi^X2,X2) 


--for (vi,v: 2 ,X 3 ) > ( 0 , 0 , 0 ); 

(1 X\ + X 2 + X 3 ) 

0 , elsewhere. 


Eind the pdf of T = Xi + Xj + X 3 . 

5.28 The pdfs of two independent random variables Xi and X 2 are 


fxAxi) 


e for xi > 0 ; 
0 , for x\ < 0 ; 

, for X 2 > 0 ; 
0 , for X 2 < 0 . 
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Determine the jpdf of Y\ and ¥ 2 , defined by 


Yt=Xi+X2, 


Y2 = 


F|’ 


and show that they are independent. 
5.29 The jpdf of X and Y is given by 


fxY{x,y) = 


2a^ 


(—00,—00) < {x,y) < (00,00). 


Determine the jpdf of R and $ and their respective marginal pdfs where 
R = (X^ + Y^y^^ is the vector length and $ = tan^' (Y/X ) is the phase angle. Are 
R and $ independent? 

5.30 Show that an alternate formula for Equation (5.67) is 

/Y(y)=/x[^-'(y)l|/r‘, 


where 


8gi(x) 8gi(x) 8gi(x) 

8xi 8x2 8x„ 

/ = : : 
8g«(x) 8g„(x) 8g„(x) 

8xi 8x2 8x„ 


is evaluated at x = g *(y). Similar alternate forms hold for Equations (5.12), (5.24) 
and (5.68). 
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Some Important Discrete 
Distributions 


This chapter deals with some distributions of discrete random variables that are 
important as models of scientific phenomena. The nature and applications of 
these distributions are discussed. An understanding of the situations in which 
these random variables arise enables us to choose an appropriate distribution 
for a scientific phenomenon under consideration. Thus, this chapter is also 
concerned with the induction step discussed in Chapter 1, by which a model 
is chosen on the basis of factual understanding of the physical phenomenon 
under study (step B to C in Figure 1.1). 

Some important distributions of continuous random variables will be studied 
in Chapter 7. 


6.1 BERNOULLI TRIALS 

A large number of practical situations can be described by the repeated per¬ 
formance of a random experiment of the following basic nature: a sequence 
of trials is performed so that (a) for each trial, there are only two possible 
outcomes, say, success and failure; (b) the probabilities of the occurrence of 
these outcomes remain the same throughout the trials; and (c) the trials are 
carried out independently. Trials performed under these conditions are called 
Bernoulli trials. Despite of the simplicity of the situation, mathematical models 
arising from this basic random experiment have wide applicability. In fact, 
we have encountered Bernoulli trials in the random walk problems described 
in Examples 3.5 (page 52) and 4.17 (page 106) and also in the traffic problem 
examined in Example 3.9 (page 64). More examples will be given in the 
sections to follow. 

Let us denote event ‘success’by 5, and event ‘failure’ by E. Also, let P(5) =p, 
and P(F) = q, where p + q = 1. Possible outcomes resulting from performing 
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a sequence of Bernoulli trials can be symbolically represented by 

SSFFSFSSS -FF 
FSFSSFFFS■ ■ ■ SF 


and, owing to independence, the probabilities of these possible outcomes are 
easily computed. For example, 

P{SSFFSF ■■■FF)= P{S)P{S)P{F)P{F)P{S)P{F) ■ ■ ■ P{F)P{F) 

= ppmpq ■■■qq- 


A number of these possible outcomes with their associated probabilities are 
of practical interest. We introduce three important distributions in this connection. 


6.1.1 BINOMIAL DISTRIBUTION 

The probability distribution of a random variable X representing the number of 
successes in a sequence of n Bernoulli trials, regardless of the order in which they 
occur, is frequently of considerable interest. It is clear that A is a discrete random 
variable, assuming values 0,1,2,..., n. In order to determine its probability mass 
function, consider Pxik), the probability of having exactly k successes in n trials. 
This event can occur in as many ways as k letters S can be placed in n boxes. 
Now, we have n choices for the position of the first S, n — I choices for the 
second S,..., and, finally, n — k + I choices for the position of the klh S. The 
total number of possible arrangements is thus n(n — 1)... (n — k + 1). However, 
as no distinction is made of the Ss that are in the occupied positions, we must 
divide the number obtained above by the number of ways in which k Ss can be 
arranged in k boxes, that is, k(k — 1)... 1 — kl. Hence, the number of ways in 
which k successes can happen in n trials is 

n{n-l)---{n-k+l) n\ 

k\ k\{n-k)V ^ ^ 

and the probability associated with each is Hence, we have 


Pxik) = /c = 0,1 , 2 ,...,«, 


( 6 . 2 ) 
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where 


\k 


n\ 


k\{n — k)\ 


is the binomial coefficient in the binomial theorem 


(a + Z))" = V 


(6.3) 


(6.4) 


In view of its similarity in appearance to the terms of the binomial theorem, 
the distribution defined by Equation (6.2) is called the binomial distribution. 
It has two parameters, namely, n and p. Owing to the popularity of this 
distribution, a random variable X having a binomial distribution is often 
denoted by B(n,p). 

The shape of a binomial distribution is determined by the values assigned 
to its two parameters, n and p. In general, n is given as a part of the problem 
statement and p must be estimated from observations. 

A plot of probability mass function (pmf), px(k), has been shown in Example 
3.2 (page 43) for n = 10 and p = 0.2. The peak of the distribution will shift to 
the right as p increases, reaching a symmetrical distribution when p — 0.5. More 
insight into the behavior of Px(k) can be gained by taking the ratio 

Px{k) ^ {n-k+ \)p ^ {n+\)p-k 

Px{k-l) kq kq ' ^ ^ 

We see from Equation (6.5) that Px{k) is greater than Px{k—l) when 
k < {n+ \)p and is smaller when k > {n+ \)p. Accordingly, if we define integer 
k* by 


{n + \ )p — \ < k* < {n + \)p, (6.6) 

the value of Px{k) increases monotonically and attains its maximum value when 
k = k*, then decreases monotonically. If in + \)p happens to be an integer, the 
maximum value takes place at both Pxik* — 1) and p^ik*). The integer k* is 
thus a mode of this distribution and is often referred to as ‘the most probable 
number of successes’. 

Because of its wide usage, pmf Pxik) is widely tabulated as a function of 
n and p. Table A.l in Appendix A gives its values for n = 2, 3,..., 10, and 
p = 0.01,0.05,..., 0.50. Let us note that probability tables for the binomial 
and other commonly used distributions are now widely available in a number 
of computer software packages, and even on some calculators. Eor example. 
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function BINOMDIST in Microsoft® Excel^^ 2000 gives individual binomial 
probabilities given by Equation (6.2). Other statistical functions available in 
Excel^^ 2000 are listed in Appendix B. 

The calculation of Px(k) in Equation (6.2) is cumbersome as n becomes large. 
An approximate way of determining Px(k) for large n has been discussed in 
Example 4.17 (page 106) by means of Stirling’s formula [Equation (4.78)]. 
Poisson approximation to the binomial distribution, to be discussed in Section 
6.3.2, also facilitates probability calculations when n becomes large. 

The probability distribution function (PDE), Fx{x), for a binomial distribu¬ 
tion is also widely tabulated. It is given by 

m<x 

= E ((6-7) 

k^o 

where m is the largest integer less than or equal to x. 

Other important properties of the binomial distribution have been derived in 
Example 4.1 (page 77), Example 4.5 (page 81), and Example 4.14 (page 99). 
Without giving details, we have, respectively, for the characteristic function, 
mean, and variance, 


4>x{t) = {pe^' + q)", 

nix = np, (6.8) 

cr|. = npq. 


The fact that the mean of X is np suggests that parameter p can be estimated based 
on the average value of the observed data. This procedure is used in Examples 6.2. 
We mention, however, that this parameter estimation problem needs to be exam¬ 
ined much more rigorously, and its systematic treatment will be taken up in Part B. 

Let us remark here that another formulation leading to the binomial distri¬ 
bution is to define random variable Xj,j = 1, 2,..., n, to represent the outcome 
of the yth Bernoulli trial. If we let 

,, f 0 if /th trial is a failure, 

W; = ( , .i. . . ... (6.9) 

[1 it yth trial is a success, 

then the sum 

X = Xi+X2 + --- + X„ (6.10) 

gives the number of successes in n trials. By definition, Xi,..., and A„ are 
independent random variables. 
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The moments and distribution of X can be easily found by using Equation 
(6.10). Since 


E{Xj} = 0{q) + 10) =p, j=l,2,...,n, 


it follows from Equation (4.38) that 

E{X}=p+p-\ - \-p = np, (6.11) 

which is in agreement with the corresponding expression in Equations (6.8). 
Similarly, its variance, characteristic function, and pmf are easily found follow¬ 
ing our discussion in Section 4.4 concerning sums of independent random 
variables. 

We have seen binomial distributions in Example 3.5 (page 52), Example 3.9 
(page 64), and Example 4.11 (page 96). Its applications in other areas are 
further illustrated by the following additional examples. 

Example 6.1. Problem: a homeowner has just installed 20 light bulbs in a new 
home. Suppose that each has a probability 0.2 of functioning more than three 
months. What is the probability that at least five of these function more than 
three months? What is the average number of bulbs the homeowner has to 
replace in three months? 

Answer: it is reasonable to assume that the light bulbs perform indepen¬ 
dently. If X is the number of bulbs functioning more than three months 
(success), it has a binomial distribution with n = 20 and p = 0.2. The answer 
to the first question is thus given by 

20 4 

= 1 - 

k—5 k—O 

= 1 - (0.012 -f 0.058 -f 0.137 -f 0.205 + 0.218) = 0.37. 

The average number of replacements is 


20 - E{X} = 20-np = 20 - 20(0.2) = 16. 

Example 6.2. Suppose that three telephone users use the same number and 
that we are interested in estimating the probability that more than one will use 
it at the same time. If independence of telephone habit is assumed, the prob¬ 
ability of exactly k persons requiring use of the telephone at the same time is 
given by the mass function Px(k) associated with the binomial distribution. Let 
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it be given that, on average, a telephone user is on the phone 5 minutes per 
hour; an estimate of p is 



1 

12 ’ 


The solution to this problem is given by 


Px{'^) + Px{^) 





3 


Example 6.3. Problem: let Xi and X2 be two independent random variables, 
both having binomial distributions with parameters in\,p) and {n2,p), respect¬ 
ively, and let Y = Xi + X 2 . Determine the distribution of random variable Y. 

Answer: the characteristic functions of Xi and X2 are, according to the first 
of Equations (6.8), 

+ q)'",(j>x2{t) = {pe>‘ + qy\ 

In view of Equation (4.71), the characteristic function of Y is simply the 
product of ipxiit) and Thus, 

4>Y{t) = 4>x^(.t)4>x2(.t) 

= (pej' + #■+"^ 

By inspection, it is the characteristic function corresponding to a binomial 
distribution with parameters («i + Hence, we have 

Pxik) = A:= 0, l,...,«i +«2- 

Generalizing the answer to Example 6.3, we have the following important 
result as stated in Theorem 6.1. 

Theorem 6.1: The binomial distribution generates itself under addition of 
independent random variables with the same p. 

Example 6.4. Problem: if random variables X and Y are independent binomial 
distributed random variables with parameters {n\,p) and (n2,p), determine the 
conditional probability mass function of X given that 

X + Y = m, 0 < m < «i + « 2 - 
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Answer: for k < min (ni,m), we have 


P{X = k\X+ Y = m) 


P(X = kr\X + Y = m) 

P(X +Y = m) 

P(X = kr\Y = m-k)_ P(X = k)P( Y = m-k) 
P{X +Y = m) “ P{X +Y = m) 



( 6 . 12 ) 


where we have used the result given in Example 6.3 that A + F is binomially 
distributed with parameters (ni + n 2 ,p). 

The distribution given by Equation (6.12) is known as the hypergeometric 
distribution. It arises as distributions in such cases as the number of black balls 
that are chosen when a sample of m balls is randomly selected from a lot of 
n items having ni black balls and n 2 white balls («i + «2 = «)• Let random 
variable Z be this number. We have, from Equation (6.12), on replacing n 2 
by n — «!, 



k = 0,1,... ,min(ni, m). 


(6.13) 


6.1.2 GEOMETRIC DISTRIBUTION 

Another event of interest arising from Bernoulli trials is the number of trials to 
(and including) the first occurrence of success. If X is used to represent this 
number, it is a discrete random variable with possible integer values ranging 
from one to infinity. Its pmf is easily computed to be 


p^{k) = P{FF .^.F^S) = P {F)P{F)... P{F)_ P{S) 

k-\ 

= ^= 1 , 2 ,.... 


(6.14) 


This distribution is known as the geometric distribution with parameter p, 
where the name stems from its similarity to the familiar terms in geometric 
progression. A plot of Px(k) is given in Figure 6.1. 
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P^{k) 



Figure 6.1 Geometric distribution Px{k) 

The corresponding probability distribution function is 

m<x 

=p + qp+---+ q’”^^p 

k=\ 

= (1 - ^)(1 +q+q^ + --- + <?'”-*) = 1 - q"\ (6.15) 

where m is the largest integer less than or equal to x. The mean and variance of 
X can be found as follows: 


E{X} = ^kq>^-'p=p^-q^ 


k=l 


k=l 




dq \1 — q 


1 

P' 


(6.16) 


In the above, the interchange of summation and differentiation is allowed 
because \q\ < 1. Following the same procedure, the variance has the form 


o-jv = 


\-p 

P^ 


(6.17) 


Example 6.5. Problem: a driver is eagerly eyeing a precious parking space 
some distance down the street. There are five cars in front of the driver, each of 
which having a probability 0.2 of taking the space. What is the probability that 
the car immediately ahead will enter the parking space? 

Answer: for this problem, we have a geometric distribution and need to 
evaluate Px{k) for k = 5 and p = 0.2. Thus, 

Px{5) = (0.8)"(0.2) = 0.82, 
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which may seem much smaller than what we experience in similar situa¬ 
tions. 

Example 6.6. Problem: assume that the probability of a specimen failing 
during a given experiment is 0.1. What is the probability that it will take more 
than three specimens to have one surviving the experiment? 

Answer: let X denote the number of trials required for the first specimen 
to survive. It then has a geometric distribution with p = 0.9. The desired 
probability is 

P(A>3) = 1 - Fx{3) = l-{\-q^) = (0.1)^ = 0.001. 

Example 6.7. Problem: let the probability of occurrence of a flood of magni¬ 
tude greater than a critical magnitude in any given year be 0.01. Assuming that 
floods occur independently, determine E{N}, the average return period. The 
average return period, or simply return period, is defined as the average number 
of years between floods for which the magnitude is greater than the critical 
magnitude. 

Answer: it is clear that A is a random variable with a geometric distribution 
and /) = 0.01. The return period is then 

ii{A} = - = 100 years. 

The critical magnitude which gives rise to E{N} = 100 years is often referred to 
as the TOO-year flood’. 


6.1.3 NEGATIVE BINOMIAL DISTRIBUTION 

A natural generalization of the geometric distribution is the distribution of 
random variable X representing the number of Bernoulli trials necessary for the 
rth success to occur, where r is a given positive integer. 

In order to determine Px(k) for this case, let A be the event that the first k — I 
trials yield exactly r — 1 successes, regardless of their order, and B the event that 
a success turns up at the kth trial. Then, owing to independence, 

Px{k) = P{AOB) = P{A)P{B). (6.18) 

Now, P{A) obeys a binomial distribution with parameters k — \ and r — 1, or 

, k=r,r+\,..., (6.19) 
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and P(B) is simply 


P{B)=p. 


( 6 . 20 ) 


Substituting Equations (6.19) and (6.20) into Equation (6.18) results in 


Px{k) = 


k-l 

r-l 


p q 


„k-r 


k= r,r+ I, 


( 6 . 21 ) 


We note that, as expected, it reduces to the geometric distribution when r = 1. 
The distribution defined by Equation (6.21) is known as the negative binomial, 
or Pascal, distribution with parameters r and p. It is often denoted by NB(r,p). 

A useful variant of this distribution is obtained if we let F = A — r. The ran¬ 
dom variable Y is the number of Bernoulli trials beyond r needed for the realiza¬ 
tion of the rth success, or it can be interpreted as the number of failures before 
the rth success. 

The probability mass function of F,/iy(»r), is obtained from Equation (6.21) 
upon replacing k hy m + r. Thus, 


PY{m) 


/m + r — 

V r-l 



/m + r — 
\ m 



m — 0,1,2,.... 


( 6 . 22 ) 


We see that random variable Y has the convenient property that the range of 
m begins at zero rather than r for values associated with X. 

Recalling a more general definition of the binomial coefficient 


f a\ a{a — 1)... {a — j + 1) 

ji ^ ’ 


for any real a and any positive integer j, direct evaluation shows that the 
binomial coefficient in Equation (6.22) can be written in the form 


/m + r — 
\ m 



(6.24) 


Hence, 


PY{m) = ( ^)/(-?)'" , m = 0,1,2,..., 

\m / 


(6.25) 
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which is the reason for the name ‘negative binomial distribution’. 

The mean and variance of random variable X can be determined either by 
following the standard procedure or by noting that X can be represented by 

X = Xi+X2 + ---+X,, (6.26) 


where Xj is the number of trials between the (j — l)th and (including) the jth 
successes. These random variables are mutually independent, each having the 
geometric distribution with mean \jp and variance (1 — p)lp^. Therefore, the mean 
and variance of sum X are, respectively, according to Equations (4.38) and (4.41), 



r{^-p) 


Since Y = X — r, the corresponding moments of Y are 


my 



4 = 


r{^-p) 

P^ 


(6.27) 


(6.28) 


Example 6.8. Problem: a curbside parking facility has a capacity for three 
cars. Determine the probability that it will be full within 10 minutes. It is 
estimated that 6 cars will pass this parking space within the timespan and, on 
average, 80% of all cars will want to park there. 

Answer: the desired probability is simply the probability that the number of 
trials to the third success (taking the parking space) is less than or equal to 6. If 
X is this number, it has a negative binomial distribution with r = 3 and p = 0.8. 
Using Equation (6.21), we have 

P{X < 6) = '^Pxik) = XI 9 

= (0.8)^[1 + (3)(0.2) + (6)(0.2)2 + (10)(0.2)^] 

= 0.983. 


^(0.8)^(0.2)'^^^ 


Let us note that an alternative way of arriving at this answer is to sum the 
probabilities of having 3, 4, 5, and 6 successes in 6 Bernoulli trials using the 
binomial distribution. This observation leads to a general relationship between 
binomial and negative binomial distributions. Stated in general terms, if is 
B(n,p) and X 2 is NB(r,p), then 


P{X, >r) = P{X 2 < n), 
P{Xi <r)= P{X 2 > n). 


(6.29) 
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Example 6.9. The negative binomial distribution is widely used in waiting¬ 
time problems. Consider, for example, a car waiting on a ramp to merge into 
freeway traffic. Suppose that it is 5th in line to merge and that the gaps between 
cars on the freeway are such that there is a probability of 0.4 that they are 
large enough for merging. Then, if X is the waiting time before merging for 
this particular vehicle measured in terms of number of freeway gaps, it has 
a negative binomial distribution with r — 5 and p = 0.4. The mean waiting time 
is, as seen from Equation (6.27), 


6.2 MULTINOMIAL DISTRIBUTION 

Bernoulli trials can be generalized in several directions. A useful generalization 
is to relax the requirement that there be only two possible outcomes for each 
trial. Let there be r possible outcomes for each trial, denoted by E\,E 2 ,... ,Er, 
and let P{Ei) = pi,i = I,... ,r, and pi + pi + ■ ■ ■ + Pr = 1-A typical outcome of 
n trials is a succession of symbols such as: 


E2E\Et,Et,E(,E2 .... 


If we let random variable A,, i= 1,2,..., r, represent the number of £, in a 
sequence of n trials, the joint probability mass function (jpmf) of Ai, A 2 ,..., A^, 
is given by 


(6.30) 


where kj = , j = 1, 2,..., r, and k\ + k 2 + ■ ■ ■ + = n. 

Proof for Equation 6.30: we want to show that the coefficient in Equation 
(6.30) is equal to the number of ways of placing k\ letters E\,k 2 letters E 2 ,..., 
and kr letters Ey in n boxes. This can be easily verified by writing 

n\ / n \ /n — k\\ /n — k\ — k 2 — ■ ■ ■ — kr-i 

k,\k2\...k,\=[kj[ k2 j"A kr 

The first binomial coefficient is the number of ways of placing ki letters £1 
in n boxes; the second is the number of ways of placing k 2 letters £2 in the 
remaining n — ki unoccupied boxes; and so on. 
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The formula given by Equation (6.30) is an important higher-dimensional 
joint probability distribution. It is called the multinomial distribution because 
it has the form of the general term in the multinomial expansion of 
(T’i +/’2 + • • • + PrT- We note that Equation (6.30) reduces to the binomial 
distribution when r = 2 and with p\ = p, pj = q, k\ = k, and k 2 = n — k. 

Since each X, defined above has a binomial distribution with parameters n 
and Pi, we have 


mxi = npi, ax, = npi{l -pi), (6.31) 

and it can be shown that the covariance is given by 

cov{Xi, Xj) = -npipj, i,j = 1,2,..., r, i^j. (6.32) 

Example 6.10. Problem: income levels are classified as low, medium, and high in 
a study of incomes of a given population. If, on average, 10% of the population 
belongs to the low-income group and 20 % belongs to the high-income group, what 
is the probability that, of the 10 persons studied, 3 will be in the low-income group 
and the remaining 7 will be in the medium-income group? What is the marginal 
distribution of the number of persons (out of 10 ) at the low-income level? 

Answer: let Xi be the number of low-income persons in the group of 10 
persons, X 2 be the number of medium-income persons, and A 3 be the number 
of high-income persons. Then Ai, A 2 , and A 3 have a multinomial distribution 
with Pi = 0.1, p 2 = 0.7, and /13 = 0.2; n = 10. 

Thus 


Pz.A-.X3 (3,7,0) = = O-Ol- 

The marginal distribution of Ai is binomial with « = 10 and p = 0.1. 

We remark that, while the single-random-variable marginal distributions 
are binomial, since Ai,A 2 ,..., and Xr are not independent, the multinomial 
distribution is not a product of binomial distributions. 


6.3 POISSON DISTRIBUTION 

In this section we wish to consider a distribution that is used in a wide variety 
of physical situations. It is used in mathematical models for describing, in a 
specific interval of time, such events as the emission of a particles from a 
radioactive substance, passenger arrivals at an airline terminal, the distribution 
of dust particles reaching a certain space, car arrivals at an intersection, and 
many other similar phenomena. 
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To fix ideas in the following development, let us consider the problem of 
passenger arrivals at a hus terminal during a specified time interval. We shall 
use the notation X (0, t) to represent the number of arrivals during time interval 
[0, f), where the notation [) denotes a left-closed and right-open interval; it is a 
discrete random variable taking possible values 0,1, 2,, whose distribution 
clearly depends on t. For clarity, its pmf is written as 

p,{0,t) = P[X{0,t)=k], k = 0,l,2,..., (6.33) 

to show its explicit dependence on t. Note that this is different from our 
standard notation for a pmf. 

To study this problem, we make the following basic assumptions: 

• Assumption 1: the random variables X{t\,t 2 ), X{t 2 , tj,),..., X{t„ — i, t„), 
t[ < t 2 < ■ ■ ■ < t„, are mutually independent, that is, the numbers of passen¬ 
ger arrivals in nonoverlapping time intervals are independent of each other. 

• Assumption 2: for sufficiently small At, 

+ At) = XAt + o{At) (6.34) 


where o(At) stands for functions such that 


lim 

Ar^O 


At 


= 0 . 


(6.35) 


This assumption says that, for a sufficiently small At, the probability of 
having exactly one arrival is proportional to the length of At. The parameter 
A in Equation (6.34) is called the average density or mean rate of arrival for 
reasons that will soon be made clear. For simplicity, it is assumed to be a 
constant in this discussion; however, there is no difficulty in allowing it to 
vary with time. 

• Assumption 3: for sufficiently small At, 


'^Pkikt + At) = o{At) 

k=2 


(6.36) 


This condition implies that the probability of having two or more arrivals 
during a sufficiently small interval is negligible. 
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No arrival 


No arrival 


0 


t+At 


Figure 6.2 Interval [0, t + At) 


It follows from Equations (6.34) and (6.36) that 


OO 


Po{t, t+At) = \- t + At) 


= 1 - \At + o{At). 


(6.37) 


In order to determine probability mass function />^(0, i) based on the 
assumptions stated above, let us first consider /)q( 0, ?)■ Figure 6.2 shows two 
nonoverlapping intervals, [0, t) and [t, t + At). In order that there are no 
arrivals in the total interval [0, t + At), we must have no arrivals in both 
subintervals. Owing to the independence of arrivals in nonoverlapping inter¬ 
vals, we thus can write 


Pq{Q, t + At)= Pq{0, t)/>o(l, t + At) 

= /^o(0>0[l - AAt-b o(At)]. 


(6.38) 


Rearranging Equation (6.38) and dividing both sides by At gives 



7^o(0,t +At) -;>o(0,t) 

At 


Upon letting At ^ 0, we obtain the differential equation 



(6.39) 


Its solution satisfying the initial condition Pq{Q, 0) = 1 is 

t) = 


(6.40) 


The determination of pi(0,t) is similar. We first observe that one arrival in 
[0, t + At) can be accomplished only by having no arrival in subinterval [0, t) 
and one arrival in [t, t + At), or one arrival in [0, t) and no arrival in [t, t + At). 
Hence we have 


Pi (0, t + At) = /7o(0, t)pi (t, t + At) +pi (0, t)pQ{t, t + At). (6.41) 
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Substituting Equations (6.34), (6.37), and (6.40) into Equation (6.41) and 
letting At ^ 0 we obtain 

^Pl^=-Xp^{0,t) + Xc-^‘, p,{0,0)^0, (6.42) 

which yields 

Pi{0,t) = Xte-’'‘. (6.43) 

Continuing in this way we find, for the general term. 


(6.44) 

Equation (6.44) gives the pmf of X (0, f), the number of arrivals during 
time interval [0, f) subject to the assumptions stated above. It is called the 
Poisson distribution, with parameters A and t. However, since A and t appear in 
Equation (6.44) as a product, At, it can be replaced by a single parameter i/, 
V = At, and so we can also write 


(6.45) 

The mean of X{0, t) is given by 




E{X{0, t)} = J2 

fcr(*-i)! 


Similarly, we can show that 

^x[Q,t) = (6-47) 

It is seen from Equation (6.46) that parameter A is equal to the average 
number of arrivals per unit interval of time; the name ‘mean rate of arrival’ for 
A, as mentioned earlier, is thus justified. In determining the value of this 
parameter in a given problem, it can be estimated from observations by mjn. 
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P*(0,f) 



(a) (b) 

Pfc(0,f) 



(c) 

Figure 6.3 Poisson distribution f), for several values of At: (a) \t = 0.5; (b) 

\t= 1.0; (c) At = 4.0 


where m is the observed number of arrivals in n unit time intervals. Similarly, 
since v = represents the average number of arrivals in time interval [0, f). 

Also it is seen from Equation (6.47) that, as expected, the variance, as well 
as the mean, increases as the mean rate increases. The Poisson distribution for 
several values of At is shown in Figure 6.3. In general, if we examine the ratio of 
Pi^{0,t) and t), as we did for the binomial distribution, it shows that 

Pif{0, t) increases monotonically and then decreases monotonically as k 
increases, reaching its maximum when k is the largest integer not exceeding At. 

Example 6.11. Problem: traffic load in the design of a pavement system is 
an important consideration. Vehicles arrive at some point on the pavement in 
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Table 6.1 Observed frequencies (number of 
observations) of 0, 1,2 ,... vehicles arriving in a 
30-second interval (for Example 6.11) 


No. of vehicles per 30 s 

Frequency 

0 

18 

1 

32 

2 

28 

3 

20 

4 

13 

5 

7 

6 

0 

7 

1 

8 

1 

>9 

0 

Total 

120 


a random manner both in space (amplitude and velocity) and in time (arrival 
rate). Considering the time aspect alone, observations are made at 30-second 
intervals as shown in Table 6.1. 

Suppose that the rate of 10 vehicles per minute is the level of critical traffic 
load. Determine the probability that this critical level is reached or exceeded. 

Let X (0, t) be the number of vehicles per minute passing some point on the 
pavement. It can be assumed that all conditions for a Poisson distribution are 
satisfied in this case. The pmf of X (0, t) is thus given by Equation (6.44). From 
the data, the average number of vehicles per 30 seconds is 

0(18)+ 1(32)+ 2(28) +••• + 9(0)^^^^ 

-no-= 

Hence, an estimate of \t is 2.08(2) = 4.16. The desired probability is, then, 


OO 9 

i’[x(0, t) > 10] = ^0 = 1- 



^ 1 - 0.992 = 0.008. 


The calculations involved in Example 6.11 are tedious. Because of its wide 
applicability, the Poisson distribution for different values of Xt is tabulated 
in the literature. Table A.2 in Appendix A gives its mass function for values 
of Xt ranging from 0.1 to 10. Figure 6.4 is also convenient for determining 
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the PDF associated with a Poisson-distributed random variable. The answer 
to Example 6.11, for example, can easily be read off from Figure 6.4. We 
mention again that a large number of computer software packages are 
available to produce these probabilities. For example, function POISSON in 
Microsoft®ExceF'^ 2000 gives the Poisson probabilities given by Equation 
(6.44) (see Appendix B). 

Example 6.12. Problem: let Xi and X 2 be two independent random variables, 
both having Poisson distributions with parameters and 1 / 2 , respectively, and 
let Y = X\ + X 2 . Determine the distribution of Y. 

Answer: we proceed by determining first the characteristic functions of X 1 
and A 2 . They are 


00 


k=0 


Qitkj^k 

k\ 


= - 1 )] 


and 


- !)]• 

Owing to independence, the characteristic function of Y, (prit), is simply the 
product of (pxXO and (j)x2(0 [see Equation (4.71)]. Hence, 

0y(O = (px,{t)^X2{t) = exp[(j^i + V2){&^‘ - 1)]. 


By inspection, it is the characteristic function corresponding to a Poisson 
distribution with parameter v\ + V 2 . Its pmf is thus 


py{k) = 




V 2 ) 

k\ 


■V2)\ 


k = 0,l,2. 


(6.48) 


As in the case of the binomial distribution, this result leads to the following 
important theorem. Theorem 6.2. 

Theorem 6.2: the Poisson distribution generates itself under addition of 
independent random variables. 

Example 6.13. Problem: suppose that the probability of an insect laying r 
eggs is r = 0,1 ,..., and that the probability of an egg developing is p. 

Assuming mutual independence of individual developing processes, show that 
the probability of a total of k survivors is {pi/fe^P’'lk\. 
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Answer: let X be the number of eggs laid by the insect, and Y be the number 
of eggs developed. Then, given X = r, the distribution of Y is binomial with 
parameters r and p. Thus, 

P{Y = k\X = r) = Q/( 1 - pr\ k=Q,\,...,r. 

Now, using the total probability theorem, Theorem 2.1 [Equation (2.27)], 

OO 

P{Y = k) = ^P{Y = k\X = r)P{X = r) 

r^k 

=£0 


/i*^(l — pY 


(6.49) 


If we let r = A: + n. Equation (6.49) becomes 

+ kX fX{l — pYiX^^q^'' 


P{Y = k)=Y. 


«=o 


(« + k)\ 




S' 


k\ “ n\ 

72=0 


{pvfc-^Y^-pX ipvfe-P^ 


k\ 


kl 


^ = 0 , 1 , 2 ,.... 

(6.50) 


An important observation can be made based on this result. It implies that, if 
a random variable X is Poisson distributed with parameter v, then a random 
variable Y, which is derived from X by selecting only with probability p each 
of the items counted by X, is also Poisson distributed with parameter pv. Other 
examples of the application of this result include situations in which Y is the 
number of disaster-level hurricanes when X is the total number of hurricanes 
occurring in a given year, or Y is the number of passengers not being able to board 
a given flight, owing to overbooking, when X is the number of passenger arrivals. 


6.3.1 SPATIAL DISTRIBUTIONS 

The Poisson distribution has been derived based on arrivals developing in time, 
but the same argument applies to distribution of points in space. Consider the 
distribution of flaws in a material. The number of flaws in a given volume has 
a Poisson distribution if Assumptions 1-3 are valid, with time intervals replaced by 
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Table 6.2 Comparison of the observed and theoretical 
distributions of flying-bomb hits, for Example 6.14 


nk 



k 





0 

1 

2 

3 

4 

>5 

< 

229 

211 

93 

35 

7 

1 

4 

226.7 

211.4 

98.5 

30.6 

7.1 

1.6 


volumes, and if it is reasonable to assume that the probability of finding k flaws in 
any region depends only on the volume and not on the shape of the region. 

Other physical situations in which the Poisson distribution is used include 
bacteria counts on a Petri plate, the distribution of airplane-spread fertilizers 
in a field, and the distribution of industrial pollutants in a given region. 

Example 6.14. A good example of this application is the study carried out by 
Clark (1946) concerning the distribution of flying-bomb hits in one part of London 
during World War 2. The area is divided into 576 small areas of 0.25 km^ each. 
In Table 6.2, the number Uk of areas with exactly k hits is recorded and 
is compared with the predicted number based on a Poisson distribution, with 
At = number of total hits per number of areas = 537/576 = 0.932. We see an 
excellent agreement between the predicted and observed results. 


6.3.2 THE POISSON APPROXIMATION TO THE BINOMIAL 
DISTRIBUTION 


Let A be a random variable having the binomial distribution with 

PzW = Q/(l^ = 0, l,...,n. (6.51) 

Consider the case when n ^ oo, and ^ 0, in such a way that np = u remains 
fixed. We note that vis the mean of A, which is assumed to remain constant. Then, 




k / ij\ n—k 

-1 (i--) , 

nJ 


/: = 0,1,... ,/i. 


(6.52) 


As n ^ oo, the factorials n! and (n —k)\ appearing in the binomial coefficient 
can be approximated by using the Stirling’s formula [Equation (4.78)]. We also 
note that 

lim f 1 + -) = e'’. (6.53) 

n^oo V nJ 


U sing these relationships in Equation (6.52) then gives, after some manipulation. 
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= . /c = 0,1,.... (6.54) 

This Poisson approximation to the binomial distribution can be used to advan¬ 
tage from the point of view of computational labor. It also establishes the fact 
that a close relationship exists between these two important distributions. 

Example 6.15. Problem: suppose that the probability of a transistor manu¬ 
factured by a certain firm being defective is 0.015. What is the probability that 
there is no defective transistor in a batch of 100? 

Answer: let X be the number of defective transistors in 100. The desired 
probability is 

p^{0) = (^^U*^^(0.015)'’(0.985)“’“^“ = (0.985)‘“ = 0.2206. 

Since n is large and p is small in this case, the Poisson approximation is 
appropriate and we obtain 

= = e-‘-^ = 0.223, 

which is very close to the exact answer. In practice, the Poisson approximation 
is frequently used when n > 10, and p < 0:1. 

Example 6.16. Problem: in oil exploration, the probability of an oil strike 
in the North Sea is 1 in 500 drillings. What is the probability of having exactly 
3 oil-producing wells in 1000 explorations? 

Answer: in this case, n = 1000, and p = 1/500 = 0.002, and the Poisson 
approximation is appropriate. Using Equation (6.54), we have v = np = 2, 
and the desired probability is 

2 ^ 6-2 

Pjt(3) =^^ = 0.18. 

The examples above demonstrate that the Poisson distribution finds applica¬ 
tions in problems where the probability of an event occurring is small. For this 
reason, it is often referred to as the distribution of rare events. 


6.4 SUMMARY 

We have introduced in this chapter several discrete distributions that are used 
extensively in science and engineering. Table 6.3 summarizes some of the 
important properties associated with these distributions, for easy reference. 


TLFeBOOK 



184 


Fundamentals of Probability and Statistics for Engineers 


Table 6.3 Summary of discrete distributions 


Distribution 

Probability mass function 

Parameters 

Mean 

Variance 

Binomial 

k = 0,1,... ,n 

n,p 

np 

np( l - p) 

Hypergeometric 

/ni\ rn - n\\un\ 

\k) \m — k)l \mJ ’ 
k = 0,1,..., min(ui,ni) 

n,n\,m 

mn\ 

n 

tnnfn — n\)(n — m) 
n^(n — 1) 

Geometric 

k= 1,2,... 

P 

1 

P 

l-p 

P^ 

Negative binomial 

k = r,r+\,... 

r,P 

r 

r{\-p) 

^2 

(Pascal) 


P 

p 

Multinomial 

... 

n,pi, i = 1, • • 

‘.r npi 

nPii^ - Pi) 


ku---,kr = 0,1,1,..., 


Poisson 


Y^ki = n, YPi = 

/=1 Z =1 

/ — 1 ,..., 


k\ 


,k = 0,\,2,... Xt 


Xt Xt 
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PROBLEMS 


6.1 The random variable X has a binomial distribution with parameters («, p). U sing the 
formulation given by Equation (6.10), derive its probability mass function (pmf), 
mean, and variance and compare them with results given in Equations (6.2) and 
(6.8). (Hint: see Example 4.18, page 109). 

6.2 Let X be the number of defective parts produced on a certain production line. It is 
known that, for a given lot, X is binomial, with mean equal to 240, and variance 
48. Determine the pmf of X and the probability that none of the parts is defective 
in this lot. 

6.3 An experiment is repeated 5 times. Assuming that the probability of an experiment 
being successful is 0.75 and assuming independence of experimental outcomes: 

(a) What is the probability that all five experiments will be successful? 

(b) How many experiments are expected to succeed on average? 

6.4 Suppose that the probability is 0.2 that the air pollution level in a given region will 
be in the unsafe range. What is the probability that the level will be unsafe 7 days in 
a 30-day month? What is the average number of ‘unsafe’ days in a 30-day month? 

6.5 An airline estimates that 5% of the people making reservations on a certain flight 
will not show up. Consequently, their policy is to sell 84 tickets for a flight that can 
only hold 80 passengers. What is the probability that there will be a seat available 
for every passenger that shows up? What is the average number of no-shows? 

6.6 Assuming that each child has probability 0.51 of being a boy: 

(a) Find the probability that a family of four children will have (i) exactly one boy, 
(ii) exactly one girl, (iii) at least one boy, and (iv) at least one girl. 

(b) Find the number of children a couple should have in order that the probability 
of their having at least two boys will be greater than 0.75. 

6.7 Suppose there are five customers served by a telephone exchange and that each 
customer may demand one line or none in any given minute. The probability of 
demanding one line is 0.25 for each customer, and the demands are independent. 

(a) What is the probability distribution function of X, a random variable repre¬ 
senting the number of lines required in any given minute? 

(b) If the exchange has three lines, what is the probability that the customers will 
all be satisfied? 

6.8 A park-by-permit-only facility has m parking spaces. A total of n (n> m) parking 
permits are issued, and each permit holder has a probability p of using the facility 
in a given period. 

(a) Determine the probability that a permit holder will be denied a parking space 
in the given time period. 

(b) Determine the expected number of people turned away in the given time period. 

6.9 For the hypergeometric distribution given by Equation (6.13), show that as w ^ oo 
it approaches the binomial distribution with parameters m and W|/«; that is. 



k = f),\,... ,m. 


and thus that the hypergeometric distribution can be approximated by a binomial 
distribution as w ^ oo. 
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6.10 A manufacturing firm receives a lot of 100 parts, of which 5 are defective. Suppose 
that the firm accepts all 100 parts if and only if no defective ones are found in a 
sample of 10 parts randomly selected for inspection. Determine the probability that 
this lot will be accepted. 

6.11 A shipment of 10 boxes of meat contains 2 boxes of contaminated goods. An 
inspector randomly selects 4 boxes; let Z be the number of boxes of contaminated 
meat among the selected 4 boxes. 

(a) What is the pmf of Z? 

(b) What is the probability that at least one of the four boxes is contaminated? 

(c) How many boxes must be selected so that the probability of having at last one 
contaminated box is larger than 0.75? 

6.12 In a sequence of Bernoulli trials with probability p of success, determine the 
probability that r successes will occur before 5 failures. 

6.13 Cars arrive independently at an intersection. Assuming that, on average, 25% of 
the cars turn left and that the left-turn lane has a capacity for 5 cars, what is the 
probability that capacity is reached in the left-turn lane when 10 cars are delayed by 
a red signal? 

6.14 Suppose that n independent steps must he taken in the sterilization procedure for 
a biological experiment, each of which has a probability p of success. If a failure 
in any of the n steps would cause contamination, what is the probability of 
contamination if « = 10 and p = 0.99? 

6.15 An experiment is repeated in a civil engineering laboratory. The outcomes of these 
experiments are considered independent, and the probability of an experiment 
being successful is 0.7. 

(a) What is the probability that no more than 6 attempts are necessary to produce 
3 successful experiments? 

(b) What is the average number of failures before 3 successful experiments occur? 

(c) Suppose one needs 3 consecutive successful experiments. What is the prob¬ 
ability that exactly 6 attempts are necessary? 

6.16 The definition of the 100-year flood is given in Example 6.7. 

(a) Determine the probability that exactly one flood equaling or exceeding the 
100 -year flood will occur in a 100 -year period. 

(b) Determine the probability that one or more floods equaling or exceeding the 
100 -year flood will occur in a 100 -year period. 

6.17 A shipment of electronic parts is sampled by testing items sequentially until the first 
defective part is found. If 10 or more parts are tested before the first defective part 
is found, the shipment is accepted as meeting specifications. 

(a) Determine the probability that the shipment will be accepted if it contains 10% 
defective parts. 

(b) How many items need to be sampled if it is desired that a shipment with 25% 
defective parts be rejected with probability of at least 0.75? 

6.18 Cars enter an interchange from the south. On average, 40% want to go west, 10% 
east, and 50% straight on (north). Of 8 cars entering the interchange: 

(a) Determine the joint probability mass function (jpmf) of Xi (cars westbound), 
X 2 (cars eastbound), and X 3 (cars going straight on). 

(b) Determine the probability that half will go west and half will go east. 

(c) Determine the probability that more than half will go west. 
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6.19 For Example 6.10, determine the jpmf of Xi and X 2 - Determine the probability 
that, of the 10 persons studied, fewer than 2 persons will be in the low-income 
group and fewer than 3 persons will be in the middle-income group. 

6.20 The following describes a simplified countdown procedure for launching 3 space 
vehicles from 2 pads: 

• Two vehicles are erected simultaneously on two pads and the countdown pro¬ 
ceeds on one vehicle. 

• When the countdown has been successfully completed on the first vehicle, the 
countdown is initiated on the second vehicle, the following day. 

• Simultaneously, the vacated pad is immediately cleaned and prepared for the 
third vehicle. There is a (fixed) period of r days delay after the launching before 
the same pad may be utilized for a second launch attempt (the turnaround time). 

• After the third vehicle is erected on the vacated pad, the countdown procedure is 
not initiated until the day after the second vehicle is launched. 

• Each vehicle is independent of, and identical to, the others. On any single 
countdown attempt there is a probability p of a successful completion and a 
probability q (q = I — p) of failure. Any failure results in the termination of that 
countdown attempt and a new attempt is made Ihe following day. That is, any 
failure leads to a one-day delay. It is assumed that a successful countdown 
attempt can be completed in one day. 

• The failure to complete a countdown does not affect subsequent attempts in any way; 
that is, the trials are independent from day to day as well as from vehicle to vehicle. 

Let X be the number of days until the third successful countdown. Show that the 
pmf of X is given by: 


pAk) = {k-r- l)//-'- 2 (i _ ^,- 1 ) ^ = r + 2,r+3,.... 

6.21 Derive the variance of a Poisson-distributed random variable X as given by 
Equation (6.47). 

6.22 Show that, for the Poisson distribution, pi^{0, t) increases monotonically and then 
decreases monotonically as k increases, reaching its maximum when k is the largest 
integer not exceeding \t. 

6.23 At a certain plant, accidents have been occurring at an average rate of 1 every 2 
months. Assume that the accidents occur independently. 

(a) What is the average number of accidents per year? 

(b) What is the probability of there being no accidents in a given month? 

6.24 Assume that the number of traffic accidents in New York State during a 4-day 
memorial day weekend is Poisson-distributed with parameter A = 3.25 per day. Deter¬ 
mine the probability that the number of accidents is less than 10 in this 4-day period. 

6.25 A radioactive source is observed during 7 time intervals, each interval being 10 
seconds in duration. The number of particles emitted during each period is 
counted. Suppose that the number of particles emitted, say X, during each 
observed period has an average rate of 0.5 particles per second. 

(a) What is the probability that 4 or more particles are emitted in each interval? 

(b) What is the probability that in at least 1 of 7 time intervals, 4 or more particles 
are emitted? 
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6.26 Each air traffic controller at an airport is given the responsibility of monitoring at 
most 20 takeoffs and landings per hour. During a given period, the average rate of 
takeoffs and landings is 1 every 2 minutes. Assuming Poisson arrivals and depar¬ 
tures, determine the probability that 2 controllers will be needed in this time 
period. 

6.27 The number of vehicles crossing a certain point on a highway during a unit time 
period has a Poisson distribution with parameter A. A traffic counter is used to 
record this number but, owing to limited capacity, it registers the maximum 
number of 30 whenever the count equals or exceeds 30. Determine the pmf of Y 
if Y is the number of vehicles recorded by the counter. 

6.28 As an application of the Poisson approximation to the binomial distribution, 
estimate the probability that in a class of 200 students exactly 20 will have birth¬ 
days on any given day. 

6.29 A book of 500 pages contains on average 1 misprint per page. Estimate the 
probability that: 

(a) A given page contains at least 1 misprint. 

(b) At least 3 pages will contain at least 1 misprint. 

6.30 Earthquakes are registered at an average frequency of 250 per year in a given 
region. Suppose that the probability is 0.09 that any earthquake will have a 
magnitude greater than 5 on the Richter scale. Assuming independent occurrences 
of earthquakes, determine the pmf of X, the number of earthquakes greater than 5 
on the Richter scale per year. 

6.31 Let X be the number of accidents in which a driver is involved in t years. In 
proposing a distribution for X, the ‘accident likelihood’ A varies from driver to 
driver and is considered as a random variable. Suppose that the conditional pmf 
PxhkA^) is given by the Poisson distribution, 

PxhiAA = -: A: = 0,1,2,..., 

and suppose that the probability density function (pdf) of A is of the form 
(a, b> Q) 


a (aX 


/a(A) = <; hT(a) 
0 , 


/ x\ 

(^) forA>0, 

\b J 


elsewhere, 


where r(a) is the gamma function, defined by 


r(«) = r 

Jo 




Show that the pmf of X has a negative binomial distribution in the form 


Px{k) = - 


r(a + k) f a A “ / ht 


A:!r(a) \a + bt) \a + bt 
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6.32 Suppose that A, the mean rate of arrival, in the Poisson distribution is time- 
dependent and is given by 


w 

Determine pmf the probability of exactly k arrivals in the time interval 

[0, f). [Note that differential equations such as Equations (6.39) and (6.42) now 
have time-dependent coefficients.] 

6.33 Derive the jpmf of two Poisson random variables Xi and X 2 , where Ali = X (0, ti), 
and X 2 = Af(0, 12 ), ^2 > 9, with the same mean rate of arrival A. Determine prob¬ 
ability P{Xi < Xti n A 2 < Ai 2 ). This is the probability that the numbers of arrivals 
in intervals [0, rO and [0, f 2 ) are both equal to or less than the average arrivals in 
their respective intervals. 
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Some Important Continuous 
Distributions 


Let us turn our attention to some important continuous probability distribu¬ 
tions. Physical quantities such as time, length, area, temperature, pressure, load, 
intensity, etc., when they need to be described probabilistically, are modeled by 
continuous random variables. A number of important continuous distributions 
are introduced in this chapter and, as in Chapter 6, we are also concerned with 
the nature and applications of these distributions in science and engineering. 


7.1 UNIFORM DISTRIBUTION 

A continuous random variable X has a uniform distribution over an interval a to 
b(b > a) if it is equally likely to take on any value in this interval. The probability 
density function (pdf) of X is constant over interval (a, b) and has the form 


fx{x) 


- -, for a < X < b; 

b — a 

0, elsewhere. 


(7.1) 


As we see from Figure 7.1(a), it is constant over (a, b), and the height must be 
\l{b — a) in order that the area under the density function is unity. 

The probability distribution function (PDF) is, on integrating Equation (7.1), 



' 0, for X < a; 

Fx{x) = < 

X — a . 

- -, lor a < X < o; 

b — a 


1, for X > /?; 


(7.2) 
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m Fx{x) 



Figure 7.1 fa) The probability density function,and (b) the probability 
distribution function, Fx(x), of X 


which is graphically presented in Figure 7.1(b). 

The mean, mx, and variance, cr^, of X are easily found to be 




av = 


b — a. 


Ax = 


{b — a) 


(7.3) 


The uniform distribution is one of the simplest distributions and is com¬ 
monly used in situations where there is no reason to give unequal likelihoods to 
possible ranges assumed by the random variable over a given interval. For 
example, the arrival time of a flight might be considered uniformly distributed 
over a certain time interval, or the distribution of the distance from the location 
of live loads on a bridge to an end support might be adequately represented by 
a uniform distribution over the bridge span. Let us also comment that one often 
assigns a uniform distribution to a specific random variable simply because of 
a lack of information, beyond knowing the range of values it spans. 

Example 7.1. Problem; owing to unpredictable traffic situations, the time 
required by a certain student to travel from her home to her morning class 
is uniformly distributed between 22 and 30 minutes. If she leaves home at pre¬ 
cisely 7.35 a.m., what is the probability that she will not be late for class, which 
begins promptly at 8:00 a.m.? 

Answer: let X be the class arrival time of the student in minutes after 8:00 a.m. 
It then has a uniform distribution given by 

[ 0, elsewhere. 
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Figure 7.2 Probability density function,/J^.(r), of X, in Example 7.1 


We are interested in the probability P{ — 3<X < 0). As seen from Figure 7.2, it 
is clear that this probability is equal to the ratio of the shaded area and the unit 
total area. Hence, 


P{-3 <X<0) = 3 



3 

8 ’ 


It is also clear that, owing to uniformity in the distribution, the solution can 
be found simply by taking the ratio of the length from —3 to 0 to the total length 
of the distribution interval. Stated in general terms, if a random variable X is 
uniformly distributed over an interval A, then the probability of X taking 
values in a subinterval B is given by 


P{X in B) 


length of B 
length of A' 


(7.4) 


7.1.1 BIVARIATE UNIFORM DISTRIBUTION 

Let random variableX be uniformly distributed over an interval (ai, hi), and let 
random variable Y be uniformly distributed over an interval ( 02 , b 2 ). Further¬ 
more, let us assume that they are independent. Then, the joint probability 
density function of X and Y is simply 


fxY(x,y) =fx{x)f viy) 


,, w, , , for a, < JC < hi, and 02 <y < hi; 

(hi - ai)(h2 - 02 ) 

0 , elsewhere. 


(7.5) 
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It takes the shape of a flat surface bounded by {ai,b\) along the x axis and 
ia 2 , ^ 2 ) along the y axis. We have seen an application of this bivariate uniform 
distribution in Example 3.7 (page 57). Indeed, Example 3.7 gives a typical 
situation in which the distribution given by Equation (7.5) is conveniently 
applied. Let us give one more example. 

Example 7.2. Problem: a warehouse receives merchandise and fills a specific 
order for the same merchandise in any given day. Suppose that it receives 
merchandise with equal likelihood during equal intervals of time over the 
eight-hour working day and likewise for the order to be filled, (a) What is the 
probability that the order will arrive after the merchandise is received and (b) 
what is the probability that the order will arrive within two hours after the 
receipt of merchandise? 

Answer: let X be the time of receipt of merchandise expressed as a fraction of 
an eight-hour working day, and let Y be the time of receipt of the order 
similarly expressed. Then 


J 1, for 0 < X < 1; 
\ 0, elsewhere; 


(7.6) 


and similarly for fyiy )- The joint probability density function (jpdf) of A and Y 
is, assuming independence. 


fxYi.x,y) 


1, for 0 < X < 1, and 0 < p < 1; 
0, elsewhere; 


and is shown in Figure 7.3. 



Figure 7.3 Joint probability density function,/j^y(x,y), of X and Y in Example 7.2 
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To answer the first question, in part (a), we integrateover an 
appropriate region in the (x,y) plane satisfying y >x. Since/j^-j,(x,y) is a 
constant over (0,0) < (x,y) < (1, 1), this is the same as taking the ratio of the 
area satisfying y > x to the total area bounded by (0,0) < (x,y) < (1,1), which 
is unity. As seen from Figure 7.4(a), we have 

P{Y > X) = shaded area A = ^. 


We proceed the same way in answering the second question, in part (b). It is 
easy to see that the appropriate region for this part is the shaded area B, as 
shown in Figure 7.4(b). The desired probability is, after dividing area B into the 
two subregions as shown. 


P 


X < Y < X+- 


= shaded area B 




1 

32' 


We see from Example 7.2 that calculations of various probabilities of interest 
in this situation involve taking ratios of appropriate areas. If random variables 
X and Y are independent and uniformly distributed over a region A , then the 
probability of X and Y taking values in a subregion B is given by 


P[{X, F)in B] 


area of B 
area of A 


(7.7) 




Figure 7.4 (a) Region A and (b) region B in the (x,y) plane in Example 7.2 
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fxriX: y) 


y 



X 


Figure 7.5 Joint probability density function,/;fy (x,y), of Jf and Y, given by Equation (7.8) 


It is noteworthy that, if the independence assumption is removed, the jpdf 
of two uniformly distributed random variables will not take the simple form 
as given by Equation 7.5. In the extreme case when X and Y are perfectly 
correlated, the jpdf of X and Y degenerates from a surface into a line over the 
(x,y) plane. For example, let X and Y be uniformly and identically distributed 
over the interval (0,1) and let Jf — Y. Then the jpdf and X and Y has the form 


fxrix.y) 


x = j, and (0,0) < (x,>>) < (1,1), (7.8) 


V2 


which is graphically presented in Figure 7.5. More detailed discussions on correl¬ 
ated and uniformly distributed random variables can be found in Kramer (1940). 


7.2 GAUSSIAN OR NORMAL DISTRIBUTION 

The most important probability distribution in theory as well as in application 
is the Gaussian or normal distribution. A random variable X is Gaussian or 
normal if its pdf/j^(x) is of the form 



(7.9) 


— OO < X < oo 


where m and a are two parameters, with cr > 0. Our choice of these particular 
symbols for the parameters will become clear presently. 
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Its corresponding PDF is 


= (2.rJ 

r exp 

—OO 

(m — m)^ 
2(7^ 

du, —CO < X < OO, 


(7.10) 


which cannot he expressed in closed form analytically hut can be numerically 
evaluated for any x. 

The pdf and PDF expressed by Equations (7.9) and (7.10), respectively, are 
graphed in Figures 7.6(a) and 7.6(b), respectively, for m = 0 and a—l. The 


fx(x) 



(a) 


Fx(x) 



(b) 

Figure 7.6 (a) Probability density function,/;j.(x), and (b) probability distribution 

function, Fx(x), of X for m = 0 and a = 1 
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graph of f^(x) in this particular case is the well-known bell-shaped curve, 
symmetrical about the origin [Figure 7.6(a)]. 

Let us determine the mean and variance of X. By definition, the mean of X, 
E{X}, is given by 


/ oo ^ roo 

xf^{x)dx= ^ / -^exp 

■oo (ZTt) ' crj-oo 


{'CO 

(x — m)^ 

/ xexp 

/—oo 

IcP- 


dx, 


which yields 


E{X} = m. 


Similarly, we can show that 


var(Jr)=cr^ (7.11) 

We thus see that the two parameters m and cr in the probability distribution 
are, respectively, the mean and standard derivation of X. This observation 
justifies our choice of these special symbols for them and it also points out 
an important property of the normal distribution - that is, the knowledge of 
its mean and variance completely characterizes a normal distribution. Since the 
normal distribution will be referred to frequently in our discussion, it is some¬ 
times represented by the simple notation N(»r, cr^). Thus, for example, 
X: N(0,9) implies that X has the pdf given by Equation (7.9) with m — 0 and 
(7=3. 

Higher-order moments of X also take simple forms and can be derived in 
a straightforward fashion. Let us first state that, following the definition of 
characteristic functions discussed in Section 4.5, the characteristic function of a 
normal random variable X is 


Mt) = = 


1 


= expf^jmt — 


nco 

(x — m)^l 

/ exp 

J —CO 

jtx - ^ ' 

2cr2 


dx 


ti¬ 
er t 


(7.12) 


The moments of X of any order can now be found from the above through 
differentiation. Expressed in terms of central moments, the use of Equation 
(4.52) gives us 


Jo, if « is odd; 

J 1(3) • • • (« — l)cr”, if n is even. 


(7.13) 
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Let us note in passing that 72 , the coefficient of excess, defined by Equation 
(4.12), for a normal distribution is zero. Hence, it is used as the reference 
distribution for 72 . 


7.2.1 THE CENTRAL LIMIT THEOREM 

The great practical importance associated with the normal distribution 
stems from the powerful central limit theorem stated below (Theorem 7.1). 
Instead of giving the theorem in its entire generality, it serves our 
purposes quite well by stating a more restricted version attributable to 
Lindberg (1922). 

Theorem 7.1: the central limit theorem. Let {X„} be a sequence of mutually 
independent and identically distributed random variables with means m and 
variances a^. Let 


./=i 


(7.14) 


and let the normalized random variable Z be defined as 

_ (Y - nm) 


(7.15) 


Then the probability distribution function of Z,Fziz), converges to N(0, 1) as 
n ^ 00 for every fixed z. 

Proof of Theorem 7.1: We first remark that, following our discussion in 
Section 4.4 on moments of sums of random variables, random variable Y 
defined by Equation (7.14) has mean nm and standard deviation n'^^cr. Hence, 
Z is simply the standardized random variable Y with zero mean and unit 
standard deviation. In terms of characteristic functions (fixit) of random vari¬ 
ables Xj, the characteristic function of Y is simply 


Mt) = 


Consequently, Z possesses the characteristic function 




exp 


jmt \ 


(t)x 



(7.16) 


(7.17) 
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Expanding 4>x{t) in a MacLaurin series as indicated by Equation (4.49), we 
can write 


(/)z(i) = < exp - 


]mt 

n^l^a 


1 — + o 1 

2n \n 


1 + 


m]t (a^ + / }t 


n^l^a 




exp 


-E 


as « ^ oo. 


(7.18) 


In the last step we have used the elementary identity 


lim 

»oo 


0-3 




(7.19) 


for any real c. 

Comparing the result given by Equation (7.18) with the form of the char¬ 
acteristic function of a normal random variable given by Equation (7.12), we 
see that 4>z{t) approaches the characteristic function of the zero-mean, unit- 
variance normal distribution. The proof is thus complete. 

As we mentioned earlier, this result is a somewhat restrictive version of the 
central limit theorem. It can be extended in several directions, including cases 
in which T is a sum of dependent as well as nonidentically distributed random 
variables. 

The central limit theorem describes a very general class of random phenom¬ 
ena for which distributions can be approximated by the normal distribution. In 
words, when the randomness in a physical phenomenon is the cumulation of 
many small additive random effects, it tends to a normal distribution irres¬ 
pective of the distributions of individual effects. Eor example, the gasoline 
consumption of all automobiles of a particular brand, supposedly manufac¬ 
tured under identical processes, differs from one automobile to another. This 
randomness stems from a wide variety of sources, including, among other 
things: inherent inaccuracies in manufacturing processes, nonuniformities in 
materials used, differences in weight and other specifications, difference in 
gasoline quality, and different driver behavior. If one accepts the fact that each 
of these differences contribute to the randomness in gasoline consumption, 
the central limit theorem tells us that it tends to a normal distribution. By 
the same reasoning, temperature variations in a room, readout errors asso¬ 
ciated with an instrument, target errors of a certain weapon, and so on can also 
be reasonably approximated by normal distributions. 

Let us also mention that, in view of the central limit theorem, our result in 
Example 4.17 (page 106) concerning a one-dimensional random walk should he 
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of no surprise. As the number of steps increases, it is expected that position of the 
particle becomes normally distributed in the limit. 


7.2.2 PROBABILITY TABULATIONS 


Owing to its importance, we are often called upon to evaluate probabilities 
associated with a normal random variable A: N(m, a^), such as 


P{a<X < b) 


{2tt) 


1 / 2 . 


fh 

1 - 

(N 

1 

1 _ 

/ exp 

J a 

2(7^ 


dx. 


(7.20) 


However, as we commented earlier, the integral given above cannot be evaluated 
by analytical means and is generally performed numerically. For convenience, 
tables are provided that enable us to determine probabilities such as the one 
expressed by Equation (7.20). 

The tabulation of the PDF for the normal distribution with m = 0 and tr = 1 
is given in Appendix A, Table A.3. A random variable with distribution N(0,1) 
is called a standardized normal random variable, and we shall denote it by U. 
Table A.3 gives Fuiu) for points in the right half of the distribution only (i.e. 
for u > 0). The corresponding values for u < 0 are obtained from the symmetry 
property of the standardized normal distribution [see Figure 7.6(a)] by the 
relationship 


Fu{-u)=l-Fu{u). (7.21) 

First, Table A.3 in conjunction with Equation (7.21) can be used to determine 
P(a < U < b) for any a and b. Consider, for example, /’(—1.5 < U < 2.5). It is 
given by 


P{-l.5<U< 2.5) = Fu{2.5) - Fu{-1.5). 


The value of Fu{2.5) is found from Table A.3 to be 0.9938; Fu(—1.5) is equal to 
1 — F’[/(1.5), with F'[/(1.5) = 0.9332, as seen from Table A.3. Thus 

P(-1.5 < [/ < 2.5) = Fui2.5) - [1 - P(/(1.5)] 

= 0.994- 1 +0.933 = 0.927. 


More importantly. Table A.3 and Equation (7.21) are also sufficient for 
determining probabilities associated with normal random variables with arbi¬ 
trary means and variances. To do this, let us first state Theorem 7.2. 
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Theorem 7.2: Let X he a normal random variable with distribution N(m, cr^). 
Then (X — m)la is the standardized normal random variable with distribution 
N(0, 1), or 


U = 


X — m 


a 


(7.22) 


Proof of Theorem 7.2: the characteristic function of random variable 
(X — m)/a is 


is I exp 


']t{X - m) 






exp[ -^^](i)x{t/(j). 


From Equation (7.12) we have 


(j)x{t) = exp^jwt-^^. 


(7.23) 


Hence, 



exp 


'it{X -m) 

(7 


= exp 





(7.24) 


The result given above takes the form of 4>xit) with m = 0 and <7=1, and the 
proof is complete. 

Theorem 7.2 implies that 


P{a < X < b) = P[a < {Ua + m) < b] = P 


a — m 


< U< 


b — m 


(7.25) 


The value of the right-hand side can now be found from Table A.3, with the aid 
of Equation (7.21) if necessary. 

As has been noted, probabilities provided by Table A.3 can also be obtained 
from a number of computer software packages such as Microsoft® Excel^^ 
2000 (see Appendix B). 

Example 7.3. Problem: owing to many independent error sources, the length 
of a manufactured machine part is normally distributed with m — \ \ cm and 
(T = 0.2 cm. If specifications require that the length be between 10.6 cm 
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and 11.2 cm, what proportion of the manufactured parts will be rejected on 
average? 

Answer: If X is used to denote the part length in centimeters, it is reasonable 
to assume that it is distributed according to N(ll, 0.04). Thus, on average, the 
proportion of acceptable parts is /"(lO.h < X < 11.2). From Equation (7.25), 
and using Table A.3, we have 


P(10.6 < A < 11.2) = P 


10 . 6-11 11 . 2-11 

<U < 


0.2 - 0.2 
= P{-2 < U < 1) = Fu{l) - [I - Fu{2)] 
= 0.8413 - (1 - 0.9772) = 0.8185. 


The desired answer is then 1 — 0.8185, which gives 0.1815. 

The use of the normal distribution in Example 7.3 raises an immediate 
concern. Normal random variables assume values in positive and negative 
ranges, whereas the length of a machine part as well as many other physical 
quantities cannot take negative values. However, from a modeling point of 
view, it is a commonly accepted practice that normal random variables are valid 
representations for nonnegative quantities in as much as probability P(X < 0) 
is sufficiently small. In Example 7.3, for example, this probability is 


P{X < 0) = P 


U <- 



P{U < -55) ^ 0 


Example 7.4. Let us compute P(m — ka < X < m + ka) where X is distrib¬ 
uted N(m, cr^). It follows from Equations (7.21) and (7.25) that 

P{m — ka < X < m + ka) = P{—k < U < k) 

= Fu{k) - Fu{-k) = 2Fu{k) - 1. (7.26) 

We note that the result in Example 7.4 is independent of m and a and is a 
function only of k. Thus, the probability that X takes values within k standard 
deviations about its expected value depends only on k and is given by Equation 
(7.26). It is seen from Table A.3 that 68.3%, 95.5%, and 99.7% of the area 
under a normal density function are located, respectively, in the ranges 
m ± a, m ± 2(7, and m±3a. This is illustrated in Figures 7.7(a)-7.7(c). 
For example, the chances are about 99.7% that a randomly selected 
sample from a normal distribution is within the range of m ± 3cr [Figure 
7.7(c)]. 
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fx{x) 



(a) 


fx{x) 



6tW 



(c) 


Figure 7.7 The Area under the normal density function within the range (a) m ±a, (b) 

m ± 2(7, and (c) mii3a 
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7.23 MULTIVARIATE NORMAL DISTRIBUTION 


Consider two random variables X and Y. They are said to be jointly normal if 
their joint density function takes the form 


fxYix,y) = 


{x - mx){y - my) , fy - niy 

- 2p-+ - 

axo-y V 

{—oo,—oo) < (x,y) < ((X),(X)). 


X — mx 

ux 


(7.27) 


Equation (7.27) describes the bivariate normal distribution. There are five param¬ 
eters associated with it: mx,my,ax (greater than 0), ay (greater than 0), and 
P (IpI ^ !)■ A typical plot of this joint density function is given in Figure 7.8. 


fxYixN) 



Figure 7.8 Bivariate normal distribution with nix = my = 0 and ax = ay 
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Let us determine the marginal density function of random variable X. It is 
given by, following straightforward calculations, 


/ OO 

fxYix,y)<iy- 

■OO 




exp 


(x - mx) 


2al 


—OO < X < OO. (7.28) 


Thus, random variable X by itself has a normal distribution N(mx,cr^). 
Similar calculations show that Y is also normal with distribution N(mx,(j|), 
and p = pxyIo'xO'y is the correlation coefficient of X and F. We thus see that 
the five parameters contained in the bivariate density function/xj'(x,y) repre¬ 
sent five important moments associated with the random variables. This also 
leads us to observe that the bivariate normal distribution is completely char¬ 
acterized by the first-order and second-order joint moments of X and Y. 

Another interesting and important property associated with jointly normally 
distributed random variables is noted in Theorem 7.3. 

Theorem 7.3: Zero correlation implies independence when the random vari¬ 
ables are jointly normal. 

Proof of Theorem 7.3: let p = 0 in Equation (7.27). We easily get 


fxYix,y) = 


1 


(^(27r)''“crx 

=fxix)fY{y), 


exp 


(x - mxf 


2al 


1 

(27r)'/Vy 


exp 


{y - mrY 


2(Ty 


(7.29) 


which is the desired result. It should be stressed again, as in Section 4.3.1, that 
this property is not shared by random variables in general. 

We have the multivariate normal distribution when the case of two random 
variables is extended to that involving n random variables. For compactness, 
vector-matrix notation is used in the following. 

Consider a sequence of n random variables, Xi,X 2 ,... ,2f„. They are said to 
be jointly normal if the associated joint density function has the form 


./xiX2...X„(-^l W2 j • • ■ ,-^n) —/x(^) 

= (27r)^"'^^|Ap*^^ exp 

— OO < X < OO, 


1 


(x — mj'^'A '(x 


m) , 

(7.30) 


where m^ = [mi m 2 ... m„] = [E{X\} E{X 2 } ■ ■ ■ E{Xn}], and A = [py] is the 
n X n covariance matrix of X with [see Equations (4.34) and (4.35)]: 

Pij = E{{Xi - mi){Xj - mj)}. (7.31) 
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The superscripts T and —1 denote, respectively, matrix transpose and matrix 
inverse. Again, we see that a joint normal distribution is completely specified 
by the first-order and second-order joint moments. 

It is instructive to derive the joint characteristic function associated with X. 
As seen from Section 4.5.3, it is defined by 


4’XiX2...X„{tl,t2, = '/'x(t) 

= £{exp[j(tiAi 


• • + tnX„)]} 

exp(jt^x)/x(x)dx, 


(7.32) 


which gives, on substituting Equation (7.30) into Equation (7.32), 

'('x(t) = exp^jm’^t - , (7.33) 

where — [t\t 2 - ■ ■ t„]. 

Joint moments of X can be obtained by differentiating joint characteristic 
function (()x(t) with respect to t and setting t = 0. The expectation 
E{X"'X 2 ^ ■ ■ ■ A™"}, for example, is given by 


j^m2 ^ ^ ^ _j —- \-m„) 


■ gmi+m2H- \-m„ 


(7.34) 


It is clear that, since joint moments of the first-order and second-order 
completely specify the joint normal distribution, these moments also determine 
joint moments of orders higher than 2. We can show that, in the case when 
random variables Xi,X 2 , • • •, X„ have zero means, all odd-order moments of 
these random variables vanish, and, for n even, 

E{XiX2---X„}= X (7.35) 

mi 


The sum above is taken over all possible combinations of n/2 pairs of the 
n random variables. The number of terms in the summation is (1)(3)(5) - - 
(n - 3)(n - 1). 


7.2.4 SUMS OF NORMAL RANDOM VARIABLES 

We have seen through discussions and examples that sums of random variables 
arise in a number of problem formulations. In the case of normal random 
variables, we have the following important result (Theorem 7.4). 
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Theorem 7.4: let Xi,2f2,... be n jointly normally distributed random 
variables (not necessarily independent). Then random variable Y, where 

Y = c\X\ + C 2 X 2 + • • • + c„X„, (7.36) 

is normally distributed, where Ci, C 2 ,■ ■ ■, and c„ are constants. 

Proof of Theorem 7.4: for convenience, the proof will be given by assuming 
that all XjJ — 1,2, ...,n, have zero means. For this case, the mean of Y is 
clearly zero and its variance is, as seen from Equation (4.43), 

n n 

4 = = (7.37) 

1=1 /=! 


where /iy = cov{Xi,Xj). 

Since Xj are normally distributed, their joint characteristic function is given 
by Equation (7.33), which is 


/ 1 « ;i 

t(t) = exp( 

^ z=l 7=1 


The characteristic function of Y is 

<?ir(t) = = Ejexpfjt^Ci^Ajt 


k=i 


1 


= exp 


i=\ j=\ 


= exp(-:,cryt^), 


(7.38) 


(7.39) 


which is the characteristic function associated with a normal random variable. 
Hence Y is also a normal random variable. 

A further generalization of the above result is given in Theorem 7.5, which 
we shall state without proof. 

Theorem7.5: let Xi,X 2 ,..., and be n normally distributed random variables 
(not necessarily independent). Then random variables Fi, F 2 , • ■ •, and Y^, where 

n 

Yj = ^ CjkXk, J = 1,2,..., m, (7.40) 

k=l 


are themselves jointly normally distributed. 
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7.3 LOGNORMAL DISTRIBUTION 

We have seen that normal distributions arise from sums of many random 
actions. Consider now another common phenomenon which is the resultant 
of many multiplicative random effects. An example of multiplicative phenom¬ 
ena is in fatigue studies of materials where internal material damage at a given 
stage of loading is a random proportion of damage at the previous stage. In 
biology, the distribution of the size of an organism is another example for 
which growth is subject to many small impulses, each of which is proportional 
to the momentary size. Other examples include the size distribution of particles 
under impact or impulsive forces, the life distribution of mechanical compon¬ 
ents, the distribution of personal incomes due to annual adjustments, and other 
similar phenomena. 

Let us consider 


Y = XxX2...X„. (7.41) 

We are interested in the distribution of F as n becomes large, when random 
variables Xj,] = 1, 2,..., n, can take only positive values. 

If we take logarithms of both sides. Equation (7.41) becomes 


XnY = \nXx+\nX 2 + ■■■ + \nX„. (7.42) 

The random variable In Y is seen as a sum of random variables In Ai, In A 2 ,..., 
and lnA„. It thus follows from the central limit theorem that In F tends to 
a normal distribution as « ^ 00 . The probability distribution of F is thus 
determined from 


F = 


(7.43) 


where A is a normal random variable. 

Definition 7.1. Let A be N(mx,(T|). The random variable F as determined 
from Equation (7.43) is said to have a lognormal distribution. 

The pdf of F is easy to determine. Since Equation (7.43) gives F as a 
monotonic function of A, Equation (5.12) immediately gives 


friy) 


1 

yaxilir)^ 


1 

2(Jx 


{\ny-mxf , 


for y >0; 


0, elsewhere. 


(7.44) 


Equation (7.44) shows that F has a one-sided distribution (i.e. it takes values 
only in the positive range of y). This property makes it attractive for physical 
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fy{y) 



Figure 7.9 Lognormal distribution,/j,(y), with mx = 0, for several values of 

quantities that are restricted to having only positive values. In addition,/y(y) 
takes many different shapes for different values of nix and (TxCcx > 0). As seen 
from Figure 7.9, the pdf of Y is skewed to the right, this characteristic becoming 
more pronounced as ax increases. 

It is noted that parameters nix and ax appearing in the pdf of Y are the 
mean and standard deviation of X, or InF, but not of Y. To obtain a more 
natural pair of parameters for fyiy), we observe that, if medians of Aand Y are 
denoted by 9x and Oy , respectively, the definition of the median of a random 
variable gives 


0.5 = P{Y <0y) = P{X < Iney) = P{X < Ox), 


or 


InOy = Ox- 


(7.45) 


Since, owing to the symmetry of the normal distribution. 


Ox = mx-, 


we can write 


mx = In 9y- 


(7.46) 
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The mean and standard deviation of Y can be found either through direct 
integration by using/j.(y) or by using the relationship given by Equation (7.43) 
together with/_y(x). In terms of By and trinr, they take the forms 


my = 

4 = «^y[exp(of„y)-l]. 


(7.48) 


7.3.1 PROBABILITY TABULATIONS 

Because of the close ties that exist between the normal distribution and the 
lognormal distribution through Equation (7.43), probability calculations 
involving a lognormal distributed random variable can be carried out with 
the aid of probability tables provided for normal random variables as shown 
below. 

Consider the probability distribution function of Y. We have 

FY{y) = P{Y <y)=P{X <\ny)=Fx{\ny), y>0. (7.49) 

Now, since the mean of X is In Oy and its variance is (T^„y, we have: 

FY{y)=FJ^-^^^^^)=FJ—\n(j-)], y>0. (7.50) 

V ^InY J \F\nY \^Y J _ 

Since Fuiu) is tabulated, Equation (7.50) can be used for probability calcula¬ 
tions associated with Y with the aid of the normal probability table. 

Example 7.5. Problem: the annual maximum runoff F of a certain river can 
be modeled by a lognormal distribution. Suppose that the observed mean and 
standard deviation of Y are my = 300 cfs and cry = 200 cfs. Determine the 
probability P(F > 400 cfs). 

Answer: using Equations (7.48), parameters 0y and ainY are solutions of the 
equations 
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= 300, 

4 V 10^ 

exp(aL.)=,^^+l, 


resulting in 


6»y = 250, 1 

ainv = 0.61. j 

The desired answer is, using Equation (7.50) and Table A.3, 
P(Y> 400) = 1 - P( F < 400) = 1 - Fy (400) 

where 


1 /400 


Hence, 


Fy(400) =Fu 

= Fa(0.77) = 0.7794. 

F(r> 400) = 1 - 0.7794 = 0.2206. 


7.4 GAMMA AND RELATED DISTRIBUTIONS 

The gamma distribution describes another class of useful one-sided 
tions. The pdf associated with the gamma distribution is 


fxix) = \ 



[ 0, elsewhere; 


where r(r/) is the well-known gamma function: 

roo 

r{r]) = / 

Jo 


(7.51) 


distribu- 


(7.52) 


(7.53) 
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which is widely tabulated, and 


r{v) = {v-iy; 


(7.54) 


when ?7 is a positive integer. 

The parameters associated with the gamma distribution are rj and A; both 
are taken to be positive. Since the gamma distribution is one-sided, physical 
quantities that can take values only in, say, the positive range are frequently 
modeled by it. Furthermore, it serves as a useful model because of its versatility 
in the sense that a wide variety of shapes to the gamma density function can be 
obtained by varying the values of rj and A. This is illustrated in Figures 7.10(a) 
and 7.10(b) which show plots of Equation (7.52) for several values of rj and A. 
We notice from these figures that rj determines the shape of the distribution and 
is thus a shape parameter whereas A is a scale parameter for the distribution. In 
general, the gamma density function is unimodal, with its peak at x = 0 for 
?7 < 1, and at x = (?7 — 1)/A for r; > 1. 

As we will verify in Section 7.4.1.1, it can also be shown that the gamma 
distribution is an appropriate model for time required for a total of exactly 
r] Poisson arrivals. Because of the wide applicability of Poisson arrivals, the 
gamma distribution also finds numerous applications. 

The distribution function of random variable A having a gamma distribution is 


fx{x) 


pX \7] fX 


r(??, Ax) 

m 


for X > 0; 


= 0, elsewhere. 


(7.55) 


In the above, T(r], u) is the incomplete gamma function, 

nu 

T{r],u) = / x’^^'e^^dx, (7.56) 

Jo 

which is also widely tabulated. 

The mean and variance of a gamma-distributed random variable A take quite 
simple forms. After carrying out the necessary integration, we obtain 

>nx = l = (7.57) 

A number of important distributions are special cases of the gamma distribu¬ 
tion. Two of these are discussed below in more detail. 
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Figure 7.10 


6tW 



fx{x) 



Gamma distribution with: (a) 77 = 3 and A = 5, A = 3, and A = 1, and 
(b) A = 1 and tj = 0-5, 77 = 1, and 77 = 3 
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7.4.1 EXPONENTIAL DISTRIBUTION 

When ?7 = 1, the gamma density function given by Equation (7.52) reduces to 
the exponential form 


.fx{x) 


Ae for X > 0; 
0, elsewhere; 


(7.58) 


where A (A > 0) is the parameter of the distribution. Its associated PDF, mean, 
and variance are obtained from Equations (7.55) and (7.57) by setting ?7 = 1. 
They are 


Fx{x) 


_ 1 _ e Pqj. X >0; 

= 0, elsewhere; 


and 


mx 


1 

A’ 



(7.59) 


(7.60) 


Among many of its applications, two broad classes stand out. First, we will 
show that the exponential distribution describes interarrival time when arrivals 
obey the Poisson distribution. It also plays a central role in reliability, where the 
exponential distribution is one of the most important failure laws. 


7.4.1.1 Interarrival Time 

There is a very close tie between the Poisson and exponential distributions. Let 
random variable X (0, t) be the number of arrivals in the time interval [0, t) and 
assume that it is Poisson distributed. Our interest now is in the time between 
two successive arrivals, which is, of course, also a random variable. Let this 
interarrival time be denoted by T. Its probability distribution function. Frit), 
is, by definition. 


„ , , P{T<t)=l-P{T>t), fort>0; 

Frit) = < ^ , (7-61) 

I 0, elsewhere. 


In terms of X (0, t), the event T > f is equivalent to the event that there 
are no arrivals during time interval [0, f), or A (0,0 = 0. Hence, since 
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/’[X(0, t) = 0] = e as given by Equation (6.40), we have 


Frit) 


1 — e for t > 0; 
0, elsewhere. 


(7.62) 


Comparing this expression with Equation (7.59), we can establish the result 
that the interarrival time between Poisson arrivals has an exponential distribu¬ 
tion; the parameter A in the distribution of T is the mean arrival rate associated 
with Poisson arrivals. 

Example 7.6. Problem: referring to Example 6.11 (page 177), determine the 
probability that the headway (spacing measured in time) between arriving 
vehicles is at least 2 minutes. Also, compute the mean headway. 

Answer: in Example 6.11, the parameter A was estimated to be 4.16 vehicles 
per minute. Hence, if T is the headway in minutes, we have 

nOQ 

P{T > 2) = y ,/r(0df = 1 - Ft{2) = e-2(4 i6) ^ 0.00024. 


The mean headway is 

mj = — = —minutes = 0.24 minutes. 

A 4.16 

Since interarrival times for Poisson arrivals are independent, the time required 
for a total of n Poisson arrivals is a sum of n independent and exponentially 
distributed random variables. Let Tj,j= 1,2, ...,n, be the interarrival time 
between the (j — l)th and jth arrivals. The time required for a total of n arrivals, 
denoted by X„, is 

X„=Ti + T2 + ---+T„, (7.63) 

where TjJ = 1,2,..., n, are independent and exponentially distributed with the 
same parameter A. In Example 4.16 (page 105), we showed that X„ has a 
gamma distribution with rj = 2 when n = 2. The same procedure immediately 
shows that, for general n, is gamma-distributed with ij = n. Thus, as stated, 
the gamma distribution is appropriate for describing the time required for a 
total of rj Poisson arrivals. 

Example 7.7. Problem: ferries depart for trips across a river as soon as nine 
vehicles are driven aboard. It is observed that vehicles arrive independently at 
an average rate of 6 per hour. Determine the probability that the time between 
trips will be less than 1 hour. 

Answer: from our earlier discussion, the time between trips follows a gamma 
distribution with rj = 9 and A = 6. Hence, let X be the time between trips in 
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hours; its density function and distribution function are given by Equations 
(7.52) and (7.55). The desired result is, using Equation (7.55), 


P{X<\) = Fx{\) 


rfaA) 

nv) 


r(9,6) 
r(9) • 


Now, r(9) = 8!, and the incomplete gamma function r(9, 6) can be obtained by 
table lookup. We obtain: 

P{X < 1) = 0.153. 


An alternative computational procedure for determining P{X < 1) inExample 
7.7 can be found by noting from Equation (7.63) that random variable X can be 
represented by a sum of -q independent random variables. Hence, according to 
the central limit theorem, its distribution approaches that of a normal random 
variable when q is large. Thus, provided that q is large, computations such as 
that required in Example 7.7 can be carried out by using Table A.3 for normal 
random variables. Let us again consider Example 7.7. Approximating X by 
a normal random variable, the desired probability is [see Equation (7.25)] 

P{X < i)-p(^u< ^ 


where U is the standardized normal random variable. The mean and standard 
deviation of X are, using Equations (7.57), 


V 

mx = j 


9 

6 

3 

2 ’ 


and 

3 

"^ = ^ = 6 

_ 1 

“2’ 

Hence, with the aid of Table A.3, 

P(A < 1) ~ P(C/ < -1) = /-(/(-I) = 1 - 7^t/(l) 

= 1 - 0.8413 = 0.159, 

which is quite close to the answer obtained in Example 7.7. 
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7.4.1.2 Reliability and Exponential Failure Law 

One can infer from our discussion on interarrival time that many analogous 
situations can be treated by applying the exponential distribution. In reliability 
studies, the time to failure for a physical component or a system is expected to 
be exponentially distributed if the unit fails as soon as some single event, such 
as malfunction of a component, occurs, assuming such events happen indepen¬ 
dently. In order to gain more insight into failure processes, let us introduce 
some basic notions in reliability. 

Let random variable T be the time to failure of a component or system. It is 
useful to consider a function that gives the probability of failure during a 
small time increment, assuming that no failure occurred before that time. This 
function, denoted by h{t), is called the hazard function or failure rate and is 
defined by 


h{t)dt = P{t < T < t + dt\T > t) (7.64) 


which gives 


h{t) 


frit) 

l-Er(i)' 


(7.65) 


In reliability studies, a hazard function appropriate for many phenomena 
takes the so-called ‘bathtub curve’, shown in Figure 7.11. The initial portion of 
the curve represents ‘infant mortality’, attributable to component defects and 
manufacturing imperfections. The relatively constant portion of the hit) curve 
represents the in-usage period in which failure is largely a result of chance 
failure. Wear-out failure near the end of component life is shown as the 


h(t) 
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increasing portion of the h(t) curve. System reliability can be optimized by 
initial ‘burn-in’ until time f j to avoid premature failure and by part replacement 
at time t 2 to avoid wear out. 

We can now show that the exponential failure law is appropriate during the 
‘in-usage’ period of a system’s normal life. Substituting 

frit) = Ae^^', t > 0, 

and 

Frit) = 1 - t > 0, 
into Equation (7.65), we immediately have 


h{t) = A. (7.66) 

We see from the above that parameter A in the exponential distribution plays 
the role of a (constant) failure rate. 

We have seen in Example 7.7 that the gamma distribution is appropriate 
to describe the time required for a total of ij arrivals. In the context of 
failure laws, the gamma distribution can be thought of as a generalization of 
the exponential failure law for systems that fail as soon as exactly ij events 
fail, assuming events take place according to the Poisson law. Thus, the 
gamma distribution is appropriate as a time-to-failure model for systems 
having one operating unit and rj — 1 standby units; these standby units go 
into operation sequentially, and each one has an exponential time-to-failure 
distribution. 


7.4.2 CHI-SQUARED DISTRIBUTION 

Another important special case of the gamma distribution is the chi-squared 
(X^) distribution, obtained by setting A = 1/2 and rj = n/2 in Equation (7.52), 
where n is a positive integer. The distribution thus contains one parameter, 
n, with pdf of the form 


fx{x) = 1 

f ^ Un/2)-l -x/2 P ^ n. 

1 _ //^._/ /^\ -A C • lv/1 -A V./. 

1 2”/2r(n/2) 

[ 0, elsewhere. 


(7.67) 
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The parameter n is generally referred to as the degrees of freedom. The utility of 
this distribution arises from the fact that a sum of the squares of n independent 
standardized normal random variables has a distribution with n degrees of 
freedom; that is, if U\,U 2 , ■ ■ ■, and U„ are independent and distributed as 
N(0, 1), the sum 


X=U^^ + Ul + --- + Ul (7.68) 

has a distribution with n degrees of freedom. One can verify this statement 
by determining the characteristic function of each Uf (see Example 5.7, page 
132) and using the method of characteristic functions as discussed in Section 4.5 
for sums of independent random variables. 

Because of this relationship, the x^ distribution is one of our main tools in 
the area of statistical inference and hypothesis testing. These applications are 
detailed in Chapter 10. 


fxix) 



Figure 7.12 The x^ distribution for n = I, n = 2, n = 4, and n = 6 
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The pdf/j^-(x) in Equation (7.67) is plotted in Figure 7.12 for several values 
of n. It is shown that, as n increases, the shape of fxix) hecomes more 
symmetric. In view of Equation (7.68), since X can be expressed as a sum of 
identically distributed random variables, we expect that the distribution 
approaches a normal distribution as n ^ oo on the basis of the central limit 
theorem. 

The mean and variance of random variable X having a distribution are 
easily obtained from Equation (7.57) as 


mx = n, cr|. = 2n. (7-69) 


7.5 BETA AND RELATED DISTRIBUTIONS 

Whereas the lognormal and gamma distributions provide a diversity of one¬ 
sided probability distributions, the beta distribution is rich in providing varied 
probability distributions over a finite interval. The beta distribution is char¬ 
acterized by the density function 


.fx{x) 


r(G; + /?) 
r(a)r(^) 




0, elsewhere; 


for 0 < X < 1; 


(7.70) 


where parameters a and /3 take only positive values. The coefficient offxix), 
r(a -I- /?)/[r(a)r(/3)], can be represented by l/[B(a,/?)], where 




r(a)r(/7) 

T{a + P) ’ 


(7.71) 


is known as the beta function, hence the name for the distribution given by 
Equation (7.70). 

The parameters a and j3 are both shape parameters; different combinations 
of their values permit the density function to take on a wide variety of shapes. 
When a,P> 1, the distribution is unimodal, with its peak at x = (a — 1)/ 
(a-f/? — 2). It becomes U-shaped when a,(3< 1; it is J-shaped when a> 1 
and /3 < 1; and it takes the shape of an inverted J when a < 1 and /? > 1. 
Finally, as a special case, the uniform distribution over interval (0,1) results 
when a = (3 =\. Some of these possible shapes are displayed in Figures 7.13(a) 
and 7.13(b). 
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fxix) 



(a) 


fx{x) 



(b) 

Figure 7.13 Beta distribution with: (a) /3 = 2 and a = 1.0, a = 0.8, a = 0.5, and 
a = 0.2; and (b) combinations of values of a and /3 (a,/3 = 1,2,... ,7) such that q + /3 = 8 
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The mean and variance of a beta-distributed random variable X are, follow¬ 
ing straightforward integrations, 


mx 


a 

ct (3 


4 = 


ajd 

(q;-|-/3) {q. f3 1) 


(7.72) 


Because of its versatility as a distribution over a finite interval, the beta 
distribution is used to represent a large number of physical quantities for which 
values are restricted to an Identifiable interval. Some of the areas of application 
are tolerance limits, quality control, and reliability. 

An interesting situation in which the beta distribution arises is as follows. 
Suppose a random phenomenon Y can be observed independently n times and, 
after these n independent observations are ranked in order of increasing mag¬ 
nitude, let yr and be the values of the rth smallest and sth largest 

observations, respectively. If random variable X is used to denote the propor¬ 
tion of the original Y taking values between and y„^s+i, it can be shown that 
X follows a beta distribution with a = n — r — s+1, and /3 = r + s; that is. 


r(«+ 1 ) 


/^(x) = ^ r(n-r-s + l)r(r + s)' 
0, elsewhere. 


■'(1-x) 


r+.s-1 


for 0 < X < 1, 


(7.73) 


This result can be found in Wilks (1942). We will not prove this result but we 
will use it in the next section, in Example 7.8. 


7.5.7 PROBABILITY TABULATIONS 

The probability distribution function associated with the beta distribution is 


Fx{x) = { 


f 0, for X < 0 

r(Q( + /3) 

r(a)r(/7) 

L 1, for X > 1 


/ *(1 — uY *dM, for 0 < X < 1; 

Jo 


(7.74) 


which can be integrated directly. It also has the form of an incomplete beta 
function for which values for given values of ol and /? can be found from 
mathematical tables. The incomplete beta function is usually denoted by 
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Ixia, (3)- If we write Fx(x) with parameters a and (3 in the form F(x] a, j3), the 
correspondence between lx(a,j3) and F(x;a,f3) is determined as follows. If 
a> (3, then 


F{x-a,P) = Ua,(3). (7.75) 

If a < (3, then 

f(x;a,/7) = l-I(i_,)(/3,a). (7.76) 

Another method of evaluating Fx{x) in Equation (1.1 A) is to note the 
similarity in form between/_y(x) and pyik) of a binomial random variable Y 
for the case where a and [3 are positive integers. We see from Equation 
(6.2) that 


(7.77) 

Also, fx(x) in Equation (7.70) with a and (3 being positive integers takes the 
form 


/zW = a,/7=l,2,..., 0<x<l, (7.78) 

and we easily establish the relationship 

fx{x) = {a +(3-\)pY{k), a,/3= 1,2,..., 0<x< 1, (7.79) 

where Pyik) is evaluated at k = a — \, with n = a + (3 — 2, and p = x. Eor 
example, the value of/j^-(0.5) with a = 2, and /3 = 1, is numerically equal to 
2pj.(l) with n = 1, and p = 0.5; here pyil) can be found from Equation (1.11) 
or from Table A.l for binomial random variables. 

Similarly, the relationship between Fx(x) and Fy(k) can be established. It 
takes the form 


Fx{x) = \-Fy{k), a,/3= 1,2,..., 0<x< 1, (7.80) 

with k = a — n = a + P — 2, and p = x. The PDF Fy(y) for a binomial 
random variable Y is also widely tabulated and it can be used to advantage 
here for evaluating Fx(x) associated with the beta distribution. 

Example 7.8. Problem: in order to establish quality limits for a manufactured 
item, 10 independent samples are taken at random and the quality limits are 
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established by using the lowest and highest sample values. What is the prob¬ 
ability that at least 50% of the manufactured items will fail within these limits? 

Answer: let X be the proportion of items taking values within the established 
limits. Its pdf thus takes the form of Equation (7.73), with n = 10, r = 1, and 
^ = 1 . 

Hence, q;= 10—1 — 1-|-1 = 9,/3=1-|-1=2, and 


= ^x®(l — x), for 0 < X < 1; 
o! 

= 0, elsewhere. 


The desired probability is 

P{X > 0.50) = 1 - P(jr < 0.50) = 1 - Px(0.50). (7.81) 

According to Equation (7.80), the value of Fx{0-50) can be found from 


Fx{0.50) = l-FY{k), (7.82) 

where Y is binomial and A: = q;—1 = 8, n = a + P — 2 = 9, and p = 0-50. 
Using Table A.l, we find that 

Pj,(8) = 1 -Py{9) = 1 - 0.002 = 0.998. (7.83) 

Equations (7.81) and (7.82) yield 

P{X > 0.50) = 1 - Px(0.50) = 1 - 1 -f Py(8) = 0.998. (7.84) 


7.5.2 GENERALIZED BETA DISTRIBUTION 

The beta distribution can be easily generalized from one restricted to unit 
interval (0,1) to one covering an arbitrary interval ia,b). Let Y be such 
a generalized beta random variable. It is clear that the desired transforma¬ 
tion is 


Y={b-a)X + a, (7.85) 

where X is beta-distributed according to Equation (7.70). Equation (7.85) 
represents a monotonic transformation from X and Y and the procedure 
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developed in Chapter 5 can be applied to determine the pdf of F in a straight¬ 
forward manner. Following Equation (5.12), we have 


f 1 r(a + /i) 

rwrOT'^--' 1'’-’'' • 

for a < x <h\ 

( 0, elsewhere. 



(7.86) 


7.6 EXTREME-VALUE DISTRIBUTIONS 

A structural engineer, concerned with the safety of a structure, is often inter¬ 
ested in the maximum load and maximum stress in structural members. In 
reliability studies, the distribution of the life of a system having n components 
in series (where the system fails if any component fails) is a function of the 
minimum time to failure of these components, whereas for a system with a 
parallel arrangement (where the system fails when all components fail) it is 
determined by the distribution of maximum time to failure. These examples 
point to our frequent concern with distributions of maximum or minimum 
values of a number of random variables. 

To fix ideas, let Xj,j= 1,2, ...,n, denote the yth gust velocity of n gusts 
occurring in a year, and let ¥„ denote the annual maximum gust velocity. We 
are interested in the probability distribution of Y„ in terms of those of Xj. In the 
following development, attention is given to the case where random variables 
XjJ = 1,2,..., n, are independent and identically distributed with PDF Fxix) 
and pdf fxix) or pmf Px(x). Furthermore, asymptotic results for n ^ oo are 
our primary concern. For the wind-gust example given above, these conditions 
are not unreasonable in determining the distribution of annual maximum gust 
velocity. We will also determine, under the same conditions, the minimum Z„ 
of random variables Xi,X 2 ,..., and which is also of interest in practical 
applications. 

The random variables F„ and Z„ are defined by 

F„ =max(A'i,A2,...,A„), 

Z„=min(Jri,A2,...,A„). 

The PDF of F„ is 

Fr^iy) = P(F„ < j) = P(all Xj < y) 

= Pi^\ <yr\X 2 <yn---nx„<y). 
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Assuming independence, we have 

FrAy) = Fx, {y)FxAy) ■ ■ ■ FxAy), (7-88) 

and, if each Fx^iy) = Fx(y), the result is 

FrAy) = [Fx{y)T. (7.89) 

The pdf of Yn can be easily derived from the above. When the Xj are contin¬ 
uous, it has the form 

frAy) = = n[Fx{y)rAx{y)- (7.90) 

The PDF of Z„ is determined in a similar fashion. In this case, 

FzAA = Pi^n < z) = /’(at least one Xj < z) 

= P{^i <zUX 2 <zU---UX„<z) 

= 1 — P{Xi > z n Z 2 > zci ■ ■ ■ r\X„ > z). 


When the Xj are independent and identically distributed, the foregoing gives 

FzAA = 1 - [1 - FxAAW^ - FxAA] • • • [1 - FxAA] , 

= \-[\-Fx{z)r. ^ 

If random variables Xj are continuous, the pdf of Z„ is 

fzAA=n[l-Fx{z)rAxA)- (7.92) 

The next step in our development is to determine the forms of Fy^iy) ^tid 
FzA^) expressed by Equations (7.89) and (7.91) as « ^ oo. Since the initial 
distribution Fx(x) of each Xj is sometimes unavailable, we wish to examine 
whether Equations (7.89) and (7.91) lead to unique distributions for Fy^iy) and 
FzA^)’ respectively, independent of the form of Fx{x). This is not unlike 
looking for results similar to the powerful ones we obtained for the normal 
and lognormal distributions via the central limit theorem. 

Although the distribution functions Fy^(y) and FzAA become increasingly 
insensitive to exact distributional features of Xj as n ^ oo, no unique results 
can be obtained that are completely independent of the form of Fx(x). Some 
features of the distribution function Fx(x) are important and, in what follows, 
the asymptotic forms of Fy^(y) and FzAA are classified into three types based 
on general features in the distribution tails of Xj. Type I is sometimes referred 
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to as Gumbel’s extreme value distribution, and included in Type Ill is the 
important Weibull distribution. 


7.6.1 TYPE-I ASYMPTOTIC DISTRIBUTIONS OF EXTREME 
VALUES 

Consider first the Type-I asymptotic distribution of maximum values. It is the 
limiting distribution of Y„ (as n oo) from an initial distribution Fx(x) of 
which the right tail is unbounded and is of an exponential type; that is, Fxix) 
approaches 1 at least as fast as an exponential distribution. For this case, we 
can express Fx(x) in the form 

Fx{x) = 1 - exp[-g(x)], (7.93) 

where g(x) is an increasing function of x. A number of important distributions 
fall into this category, such as the normal, lognormal, and gamma distributions. 
Let 


lim r„ = Y. (7.94) 

n—»oo 

We have the following important result (Theorem 7.6). 

Theorem 7.6: let random variables Xi,X 2 ,..., and X„ be independent and 
identically distributed with the same PDF Fxix). If Fxix) is of the form given 
by Equation (7.93), we have 


Friy) = exp{—exp[— (^(f — «)], —oo < j < oo, 


(7.95) 


where aia > 0) and u are two parameters of the distribution. 

Proof of Theorem 7.6: we shall only sketch the proof here; see Gumbel (1958) 
for a more comprehensive and rigorous treatment. 

Let us first define a quantity m„, known as the characteristic value of T„, by 

Fx{u„) = \--. (7.96) 

n 

It is thus the value of Xj,] = 1, 2,..., n, at which PiXj <««)=! — 1/n. As n 
becomes large, Fxiu„) approaches unity, or, «„ is in the extreme right-hand tail 
of the distribution. It can also be shown that u„ is the mode of Y„, which can 
be verified, in the case ofXj being continuous, by taking the derivative of/j.^(y) 
in Equation (7.90) with respect to y and setting it to zero. 
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If Fxix) takes the form given by Equation (7.93), we have 


1 -exp[-g(M„)] = 1 - 

n 


or 


exp[g(M„)] _ ^ 
n 


(7.97) 


Now, consider Fy^iy) defined by Equation (7.89). In view of Equation (7.93), 
it takes the form 


FrAy) 


{I-exp[-g(j)]r 

r exp[g(M„)] exp[-g(j)] 

I « 

exp{-[g(j)-g(M„)]} 



(7.98) 


In the above, we have introduced into the equation the factor exp \g(u„)]ln, 
which is unity, as shown by Equation (7.97). 

Since is the mode or the ‘most likely’ value of F„, function g(y) in 
Equation (7.98) can be expanded in powers of (y — m„) in the form 

g{y) = g{un) + a„{y - u„)-\ -, (7.99) 


where = dg(y)/dy is evaluated at y = m„. It is positive, as g(y) is an increasing 
function of y. Retaining only up to the linear term in Equation (7.99) and 
substituting it into Equation (7.98), we obtain 


FrAy) 


L exp[-a„(y 

I « 



(7.100) 


in which q;„ and u„ are functions only of n and not of y. Using the identity 

/ C\^ 

lim 1-= exp(—c), 

n^oo \ nJ 

for any real c, Equation (7.100) tends, as n ^ oo, to 

Fyiy) = exp{-exp[-a(y - m)]}, (7.101) 

which was to be proved. In arriving at Equation (7.101), we have assumed that 
as n ^ oo, Fyfy) converges to Fyiy) as F„ converges to Y in some probabilistic 
sense. 
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The mean and variance associated with the Type-I maximum-value distribu¬ 
tion can be obtained through integration using Equation (7.90). We have noted 
that u is the mode of the distribution, that is, the value of y at which/y(y) is 
maximum. The mean of Y is 


my 



(7.102) 


where 7 ~ 0.577 is Euler’s constant; and the variance is given by 


4 = 


TT^ 

6a^ 


(7.103) 


It is seen from the above that u and a are, respectively, the location and scale 
parameters of the distribution. It is interesting to note that the skewness 
coefficient, defined by Equation (4.11), in this case is 


71 ~ 1.1396, 


which is independent of a and u. This result indicates that the Type-I 
maximum-value distribution has a fixed shape with a dominant tail to the right. 
A typical shape for/y(y) is shown in Figure 7.14. 

The Type-I asymptotic distribution for minimum values is the limiting 
distribution of Z„ in Equation (7.91) as n ^ 00 from an initial distribution 
Fx (x) of which the left tail is unbounded and is of exponential type as it decreases 
to zero on the left. An example of Fx{x) that belongs to this class is the normal 
distribution. 

The distribution of Z„ as n —> 00 can be derived by means of procedures 
given above for T„ through use of a symmetrical argument. Without giving 
details, if we let 

limZ„ = Z, (7.104) 


Mr) 



Figure 7.14 Typical plot of a Type-I maximum-value distribution 
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the PDF of Z can be shown to have the form 


Fz{z) = l—exp{—exp[ct(z — «)]}, —oo < z < oo 


(7.105) 


where a and u are again the two parameters of the distribution. 

It is seen that Type-I asymptotic distributions for maximum and minimum 
values are mirror images of each other. The mode of Z is u, and its mean, 
variance, and skewness coefficients are, respectively, 


niz = u - 

a 


^ 6a2 

71 ~ -1.1396 J 


(7.106) 


For probability calculations, values for probability distribution functions 
Fyiy) and Fz{z) over various ranges of y and z are available in, for example, 
Microsoft Excel 2000 (see Appendix B). 

Example 7.9. Problem: the maximum daily gasoline demand Y during the 
month of May at a given locality follows the Type-I asymptotic maximum- 
value distribution, with niy = 2 and ay = 1, measured in thousands of gallons. 
Determine (a) the probability that the demand will exceed 4000 gallons in 
any day during the month of May, and (b) the daily supply level that for 95% 
of the time will not be exceeded by demand in any given day. 

Answer: it follows from Equations (7.102) and (7.103) that parameters a and 
u are determined from 

TT TT 

0.577 

u = my - 

a 

For part (a), the solution is 

P{Y>4)= l-Fy{4) 

= 1 — exp{—exp[—1.282(4 — 1.55)]} 

= 1 - 0.958 = 0.042. 


= 1.282, 


= 2-^=1.55. 
1.282 


For part (b), we need to determine y such that 

FYiy)=P{Y <y) = 0.95, 
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or 


exp{-exp[-1.282(j - 1.55)]} = 0.95. (7.107) 

Taking logarithms of Equation (7.107) twice, we obtain 

>>= 3.867; 


that is, the required supply level is 3867 gallons. 

Example 7.10. Problem; consider the problem of estimating floods in the 
design of dams. Let yj denote the maximum flood associated with return 
period T. Determine the relationship between and T if the maximum river 
flow follows the Type-I maximum-value distribution. Recall from Example 6.7 
(page 169) that the return period T is defined as the average number of years 
between floods for which the magnitude is greater than yj ■ 

Answer: assuming that floods occur independently, the number of years 
between floods with magnitudes greater than yj assumes a geometric distribu¬ 
tion. Thus 


P(F>>r) 1-Ey(>r)' 

Now, from Equation (7.101), 

Ey(>r) = exp[-exp(-())], 


(7.108) 


(7.109) 


where b = aiyj —u). The substitution of Equation (7.109) into Equation 
(7.108) gives the required relationship. 

Eor values of yj where Frijr) 1, an approximation can be made by 
noting from Equation (7.109) that 

exp(-Z)) = -lnEy(> 7 ’) = -{[^^(yr) - 1] -^[Ey(>r) - 1]^ H- }• 


Since Fylyr) is close to 1, we retain only the first term in the foregoing 
expansion and obtain 


1 ~ exp(-Z)). 


Equation (7.108) thus gives the approximate relationship 


yT = u 


\+—\nT 
au 


(7.110) 
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where u is the scale factor and the value of au describes the characteristics of 
a river; it varies from 1.5 for violent rivers to 10 for stable or mild rivers. 

In closing, let us remark again that the Type-I maximum-value distribution 
is valid for initial distributions of such practical importance as normal, lognor¬ 
mal, and gamma distributions. It thus has wide applicability and is sometimes 
simply called the extreme value distribution. 


7.6.2 TYPE-II ASYMPTOTIC DISTRIBUTIONS OF EXTREME 
VALUES 

The Type-II asymptotic distribution of maximum values arises as the limiting 
distribution of T„ as n ^ oo from an initial distribution of the Pareto type, that 
is, the PDF Fx{x) of each Xj is limited on the left at zero and its right tail is 
unbounded and approaches one according to 

Fx{x) = 1 — ax^'^, a,k>0,x>0. (7-111) 

For this class, the asymptotic distribution of T„, Fyiy), as n ^ oo takes the 
form 


Fyi^y) = exp 

-(()" 

, v,k > 0, y > 0. 





(7.112) 


Let us note that, with Fxix) given by Equation (7.111), each Xj has moments 
only up to order r, where r is the largest Integer less than k.lfk> 1, the mean of 
Y is 


my = vT\ 1 - 7 ), 
kj 


and, if k > 2 , the variance has the form 


tTv = V 


F 1 -- -F^ 1 - 


(7.113) 


(7.114) 


The derivation of Fyiy) given by Equation (7.112) follows in broad outline 
that given for the Type-I maximum-value asymptotic distribution and will not 
be presented here. It has been used as a model in meteorology and hydrology 
(Gumbel, 1958). 

A close relationship exists between the Type-I and Type-II asymptotic 
maximum-value distributions. Let Tj and Yu denote, respectively, these random 
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variables. It can be verified, using the techniques of transformations of random 
variables, that they are related by 

FYu{y)=FY,{^ny), j > 0, (7.115) 


where parameters a and u in FY,(y) are related to parameters k and v in FY„(y) hy 

M=lnv and a = k. (7.116) 


When they are continuous, their pdfs obey the relationship 

fYuiy) =^fY,(}^y), y>^- 


(7.117) 


The Type-II asymptotic distribution of minimum values arises under analogous 
conditions. With PDF Fx(x) limited on the right at zero and approaching zero 
on the left in a manner analogous to Equation (7.111), we have 


Fz{z) = 1 - exp 


v,k > 0, z < 0. 


(7.118) 


However, it has not been found as useful as its counterparts in Type I and Type III 
as in practice the required initial distributions are not frequently encountered. 


7.6.3 TYPE-III ASYMPTOTIC DISTRIBUTIONS OF EXTREME 
VALUES 

Since the Type-Ill maximum-value asymptotic distribution is of limited prac¬ 
tical interest, only the minimum-value distribution will be discussed here. 

The Type-Ill minimum-value asymptotic distribution is the limiting distribu¬ 
tion of Z„ as n ^ oo for an initial distribution Fxix) in, which the left tail 
increases from zero near x = £ in the manner 

Fx{x) = c{x — e)^, c,k>0, x>s. (7.119) 

This class of distributions is bounded on the left at x = e. The gamma distri¬ 
bution is such a distribution with e = 0. 

Again bypassing derivations, we can show the asymptotic distribution for the 
minimum value to be 





Fz{z) = 1 - exp 

r fz-£N^1 

, > 0, vr > £, z > e, 

\w — e) 





(7.120) 
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and, if it is continuous, 

k ^ 7 — F\k-\ 




/t > 0, w > e, z > e. (7.121) 


The mean and variance of Z are 


niz = e + {w - e)r ( 1 + ^ 


cr| = (w-eY 


r 1 + - -r^ 1 + - 


(7.122) 


We have seen in Section 7.4.1 that the exponential distribution is used as a 
failure law in reliability studies, which corresponds to a constant hazard func¬ 
tion [see Equations (7.64) and (7.66)]. The distribution given by Equations 
(7.120) and (7.121) is frequently used as a generalized time-to-failure model 
for cases in which the hazard function varies with time. One can show that the 
hazard function 

t>0, (7.123) 

w Vw/ 

is capable of assuming a wide variety of shapes, and its associated probability 
density function for T, the time to failure, is given by 


k / / t\^ 

fAt)=-(-) exp[-(-) ], w,k>0,t>0. 

w \wJ \wJ 


(7.124) 


It is the so-called Weibull distribution, after Weibull, who first obtained it, 
heuristically (Weibull, 1939). Clearly, Equation (7.124) is a special case of 
Equation (7.121), with e = 0. 

The relationship between Type-III and Type-I minimum-value asymptotic 
distributions can also be established. Let Zj and Zm be the random variables 
having, respectively, Type-I and Type-III asymptotic distributions of minimum 
values. Then 

■fziii(z) ='f’zi[ln(z-£)], z>e, (7.125) 

with M = In (w — e), and a= k.\f they are continuous, the relationship between 
their pdfs is 


/z,„(z) 



/z,[ln(z 


e)], z>e. 


(7.126) 
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One final remark to be made is that asymptotic distributions of maximum and 
minimum values from the same initial distribution may not be of the same type. 
For example, for a gamma initial distribution, its asymptotic maximum-value 
distribution is of Type I whereas the minimum-value distribution falls into Type 
HI. With reference to system time-to-failure models, a system having n components 
in series with independent gamma life distributions for its components will have a 
time-to-failure distribution belonging to the Type-III asymptotic minimum-value 
distribution as n becomes large. The corresponding model for a system having n 
components in parallel is the Type-I asymptotic maximum-value distribution. 


7.7 SUMMARY 

As in Chapter 6, it is useful to summarize the important properties associated 
with some of the important continuous distributions discussed in this chapter. 
These are given in Table 7.1. 
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FURTHER READING AND COMMENTS 

As we mentioned in Section 7.2.1, the central limit theorem as stated may be generalized 
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random variables. See, for example, the following two references: 

Loeve, M., 1955, Probability Theory, Van Nostrand, New York. 
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Extensive probability tables exist in addition to those given in Appendix A. Prob¬ 
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PROBLEMS 

7.1 The random variables X and Y are independent and uniformly distributed in 
interval (0.1). Determine the probability that their product XY is less than 1/2. 

7.2 The characteristic function (CP) of a random variable Y uniformly distributed in the 
interval (—1,1) is 


(a) Find the CF of Y, that is uniformly distributed in interval (—a, a). 

(b) Find the CF of Y if it is uniformly distributed in interval (a, a + b). 

13 A machine component consisting of a rod-and-sleeve assembly is shown in Figure 
7.15. Owing to machining inaccuracies, the inside diameter of the sleeve is uniformly 
distributed in the interval (1.98 cm, 2.02 cm), and the rod diameter is also uniformly 
distributed in the interval (1.95 cm, 2.00 cm). Assuming independence of these two 
distributions, find the probability that: 

(a) The rod diameter is smaller than the sleeve diameter. 

(b) There is at least a 0.01 cm clearance between the rod and the sleeve. 



Figure 7.15 Rod and sleeve arrangement, for Problem 7.3 
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7.4 Repeat Problem 7.3 if the distribution of the rod diameter remains uniform hut 
that of the sleeve inside diameter is N(2cm, 0.0004 cm^). 

7.5 The first mention of the normal distribution was made in the work of de Moivre in 
1733 as one method of approximating probabilities of a binomial distribution when 
n is large. Show that this approximation is valid and give an example showing 
results of this approximation. 

7.6 If the distribution of temperature T of a given volume of gas is N(400, 1600), 
measured in degrees Fahrenheit, find: 

(a) /j.(450); 

(b) P{T < 450); 

(c) P(|r-mj-|<20); 

(d) P{\T -mT\< 20|r > 300). 

7.7 If X is a random variable and distributed as N(m, o^), show that 



7.8 Let random variable X and Y be identically and normally distributed. Show that 
random variables X + Y and X — Y are independent. 

7.9 Suppose that the useful lives measured in hours of two electronic devices, say T i 
and T 2 , have distributions N(40, 36) and N(45,9), respectively. If the electronic 
device is to be used for a 45-hour period, which is to be preferred? Which is 
preferred if it is to be used for a 48-hour period? 

7.10 Verify Equation (7.13) for normal random variables. 

7.11 Let random variables 2 ,... ,X„ be jointly normal with zero means. Show that 


E{XiX2Xi} = 0 , 

E{XiX2X3X4} = E{XiX2}E{X2X^} + E{XiX2}E{X2X^} + E{XiX^}E{X2X2}. 

Generalize the results above and verify Equation (7.35). 

7.12 Two rods, for which the lengths are independently, identically, and normally 
distributed random variables with means 4 inches and variances 0.02 square inches, 
are placed end to end. 

(a) What is the distribution of the total length? 

(b) What is the probability that the total length will be between 7.9 inches and 8.1 
inches? 

7.13 Let random variables Xi, X 2 , and X^ be independent and distributed according 
to N(0, 1), N(l, 1), and N(2,1), respectively. Determine probability P(Xi + X 2 + 
V3 > 1). 

7.14 A rope with 100 strands supports a weight of 2100 pounds. If the breaking strength 
of each strand is random, with mean equal to 20 pounds and standard deviation 4 
pounds, and if the breaking strength of the rope is the sum of the independent 
breaking strengths of its strands, determine the probability that the rope will not 
fail under the load. (Assume there is no individual strand breakage before rope 
failure.) 
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7.15 If ... ,X„ are independent random variables, all having distribution N(m, a^), 

determine the conditions that must he imposed on cj, C 2 , • • •, c„ such that the sum 

Y =c iX I C2X 2 + • • • + CnXfi 
is also N(m, Can all cs be positive? 

7.16 Let U be the standardized normal random variable, and define X = |t/|. Then, X 
is called the folded standardized normal random variable. Determine/;j.(ji:). 

7.17 The Cauchy distribution has the form 

/jrW = ^ ! 2 t ’ -oo<v:<oo. 

7r(l + x^j 

(a) Show that it arises from the ratio X 1 IX 2 , where and X 2 are independent and 
distributed as N(0, a^). 

(b) Show that the moments of X do not exist. 

7.18 Let Xi and X 2 he independent normal random variables, both with mean 0 and 
standard deviation 1. Prove that: 

Y = arctan — 

is uniformly distributed from —n to tt. 

7.19 Verify Equations (7.48) for the lognormal distribution. 

7.20 The lognormal distribution is found to be a good model for strains in structural 
members caused by wind loads. Let the strain be represented by X, with ntx = 1 
and (tJ = 0.09. 

(a) Determine the probability P(X > 1.2). 

(b) If stress T in a structural member is related to the strain by T = a + bX, with 
b > 0, determine/j.(y) and my ■ 

7.21 Arrivals at a rural entrance booth to the New York State Thruway are considered 
to be Poisson distributed with a mean arrival rate of 20 vehicles per hour. The time 
to process an arrival is approximately exponentially distributed with a mean time of 
one min. 

(a) What percentage of the time is the tollbooth operator free to work on opera¬ 
tional reports? 

(b) How many cars are expected to be waiting to be processed, on average, per hour? 

(c) What is the average time a driver waits in line before paying the toll? 

(d) Whenever the average number of waiting vehicles reaches 5, a second tollbooth 
will be opened. How much will the average hourly rate of arrivals have to 
increase to require the addition of a second operator? 

7.22 The life of a power transmission tower is exponentially distributed, with mean life 
25 years. If three towers, operated independently, are being erected at the same 
time, what is the probability that at least 2 will still stand after 35 years? 

7.23 For a gamma-distributed random variable, show that: 

(a) Its mean and variance are those given by Equation (7.57). 

(b) It has a positive skewness. 
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7.24 Show that, if is a positive integer, the probability distribution function (PDF) of 
a gamma-distributed random variable X can be written as 



Recognize that the terms in the sum take the form of the Poisson mass function and 
therefore can be calculated with the aid of probability tables for Poisson distribu¬ 
tions. 

7.25 The system shown in Figure 7.16 has three redundant components, A-C. Let their 
operating lives (in hours) he denoted hy T\, T 2 , and T^, respectively. If the 
redundant parts come into operation only when the online component fails (cold 
redundancy), then the operating life of the system, T, is T = Ti+T 2 + T 2 - 
Let Ti, T 2 , and be independent random variables, each distributed as 



7 / 100 , for iy > 0, 7 = 1,2,3; 


0, otherwise. 


Determine the probability that the system will operate at least 300 hours. 

7.26 We showed in Section 7.4.1 that an exponential failure law leads to a constant 
failure rate. Show that the converse is also true; that is, if h{t) as defined by 
Equation (7.65) is a constant then the time to failure T is exponentially distributed. 

7.27 A shifted exponential distribution is defined as an exponential distribution shifted 
to the right by an amount a\ that is, if random variable X has an exponential 
distribution with 



random variable Y has a shifted exponential distribution if/y(y) has the same 
shape as fxix) but its nonzero portion starts at point a rather than zero. Determine 
the relationship between X and Y and probability density function (pdf)/j.(y). 
What are the mean and variance of F? 


A 


B 


C 


Figure 7.16 System of components, for Problem 7.25 
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7.28 Let random variable X be x^-distributed with parameter n. Show that the limiting 
distribution of 


X-n 


as « ^ 00 is N(0, 1). 

7.29 Let Xi,X 2 , ■ ■ ■ ,Xn be independent random variables with common PDF Fx(x) 
and pdffx(x). Equations (7.89) and (7.91) give, respectively, the PDFs of their 
maximum and minimum values. Let X(j) be the random variable denoting the 
y'th-smallest value of Xi,X 2 , ■ ■ ■ ,X„. Show that the PDF of Xfjy has the form 

7 = 1,2,... ,w. 

k=i 


7.30 Ten points are distributed uniformly and independently in interval (0, 1). Find: 

(a) The probability that the point lying farthest to the right is to the left of 3/4. 
(h) The probability that the point lying next farthest to the right is to the right of 1/2. 

7.31 Let the number of arrivals in a time interval obey the distribution given in Problem 
6.32, which corresponds to a Poisson-type distribution with a time-dependent 
mean rate of arrival. Show that the pdf of time between arrivals is given by 


frU) 



0, elsewhere. 


As we see from Equation (7.124), it is the Weibull distribution. 

7.32 A multiple-member structure in a parallel arrangement, as shown in Figure 7.17, 
supports a load j. It is assumed that all members share the load equally, that their 
resistances are random and identically distributed with common PDF FR(r), and 
that they act independently. If a member fails when the load it supports exceeds 
its resistance, show that the probability that failure will occur to n — k members 
among n initially existing members is 



where 






(s), n> i > j > k. 
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Figure 7.17 Structure under load s, for Problem 7.32 


7.33 What is the probability sought in Problem 7.32 if the load is also a random variable 
S with pdf/^fi')? 

7.34 Let n = 3 in Problem 7.32. Determine the probabilities of failure in zero, one, two, 
and three members in Problem 7.32 if R follows a uniform distribution over 
interval (80,100), and s = 270. Is partial failure (one-member or two-member 
failure) possible in this case? 

7.35 To show that, as a time-to-failure model, the Weihull distribution corresponds to 
a wide variety of shapes of the hazard function, graph the hazard function in Equation 
(7.123) and the corresponding Weibull distribution in Equation (7.124) for the follow¬ 
ing combinations of parameter values: k = 0.5,1,2, and 3; and w = 1 and 2. 

7.36 The ranges of n independent test flights of a supersonic aircraft are assumed to he 
identically distributed with PDF Fx(x) and pdf fxix). If range span is defined as the 
distance between the maximum and minimum ranges of these n values, determine 
the pdf of the range span in terms of Fx(x) orfx(x). Expressing it mathematically, 
the pdf of interest is that of 5, where 

S = Y-Z, 

with 

Y = max(Wi,Z 2 ,...,W„), 

and 

Z = min(X|,W2, ...,T„). 

Note that random variables Y and Z are not independent. 
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Observed Data and Graphical 
Representation 


Referring to Figure 1.1 in Chapter 1, we are concerned in this and subsequent 
chapters with step D ^ E of the basic cycle in probabilistic modeling, that is, 
parameter estimation and model verification on the basis of observed data. In 
Chapters 6 and 7, our major concern has been the selection of an appropriate 
model (probability distribution) to represent a physical or natural phenom¬ 
enon based on our understanding of its underlying properties. In order to 
specify the model completely, however, it is required that the parameters in the 
distribution be assigned. We now consider this problem of parameter estima¬ 
tion using available data. Included in this discussion are techniques for asses¬ 
sing the reasonableness of a selected model and the problem of selecting a 
model from among a number of contending distributions when no single one 
is preferred on the basis of the underlying physical characteristics of a given 
phenomenon. 

Let us emphasize at the outset that, owing to the probabilistic nature of the 
situation, the problem of parameter estimation is precisely that - an estima¬ 
tion problem. A sequence of observations, say n in number, is a sample of 
observed values of the underlying random variable. If we were to repeat the 
sequence of n observations, the random nature of the experiment should 
produce a different sample of observed values. Any reasonable rule for 
extracting parameter estimates from a set of n observations will thus give 
different estimates for different sets of observations. In other words, no single 
sequence of observations, finite in number, can be expected to yield true 
parameter values. What we are basically interested in, therefore, is to obtain 
relevant information about the distribution parameters by actually observing 
the underlying random phenomenon and using these observed numerical 
values in a systematic way. 
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8.1 HISTOGRAM AND FREQUENCY DIAGRAMS 

Given a set of independent observations xi,X 2 , ■ ■ ■, and x„ of a random variable 
X, a useful first step is to organize and present them properly so that they can 
be easily interpreted and evaluated. When there are a large number of observed 
data, a histogram is an excellent graphical representation of the data, facilitating 
(a) an evaluation of adequacy of the assumed model, (b) estimation of percentiles 
of the distribution, and (c) estimation of the distribution parameters. 

Let us consider, for example, a chemical process that is producing batches of 
a desired material; 200 observed values of the percentage yield, X, representing 
a relatively large sample size, are given in Table 8.1 (Hill, 1975). The sample 
values vary from 64 to 76. Dividing this range into 12 equal intervals and 
plotting the total number of observed yields in each interval as the height of 
a rectangle over the interval results in the histogram as shown in Figure 8.1. 
A frequency diagram is obtained if the ordinate of the histogram is divided by 
the total number of observations, 200 in this case, and by the interval width A 
(which happens to be one in this example). We see that the histogram or 
the frequency diagram gives an immediate impression of the range, relative 
frequency, and scatter associated with the observed data. 

In the case of a discrete random variable, the histogram and frequency diagram as 
obtained from observed data take the shape of a bar chart as opposed to connected 
rectangles in the continuous case. Consider, for example, the distribution of the 
number of accidents per driver during a six-year time span in California. The data 



Percentage yield 


Figure 8.1 Histogram and frequency diagram for percentage yield 
(data source: Hill, 1975) 
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Table 8.1 Chemical yield data (data source: Hill, 1975) 


Batch 

no. Yield 
(%) 

Batch 

no. Yield 
(%) 

Batch no. Yield 
(%) 

Batch no. Yield 
(%) 

Batch no. Yield 
(%) 

1 

68.4 

41 

68.7 

81 

68.5 

121 

73.3 

161 

70.5 

2 

69.1 

42 

69.1 

82 

71.4 

122 

75.8 

162 

68.8 

3 

71.0 

43 

69.3 

83 

68.9 

123 

70.4 

163 

72.9 

4 

69.3 

44 

69.4 

84 

67.6 

124 

69.0 

164 

69.0 

5 

72.9 

45 

71.1 

85 

72.2 

125 

72.2 

165 

68.1 

6 

72.5 

46 

69.4 

86 

69.0 

126 

69.8 

166 

67.7 

7 

71.1 

47 

75.6 

87 

69.4 

127 

68.3 

167 

67.1 

8 

68.6 

48 

70.1 

88 

73.0 

128 

68.4 

168 

68.1 

9 

70.6 

49 

69.0 

89 

71.9 

129 

70.0 

169 

71.7 

10 

70.9 

50 

71.8 

90 

70.7 

130 

70.9 

170 

69.0 

11 

68.7 

51 

70.1 

91 

67.0 

131 

72.6 

171 

72.0 

12 

69.5 

52 

64.7 

92 

71.1 

132 

70.1 

172 

71.5 

13 

72.6 

53 

68.2 

93 

71.8 

133 

68.9 

173 

74.9 

14 

70.5 

54 

71.3 

94 

67.3 

134 

64.6 

174 

78.7 

15 

68.5 

55 

71.6 

95 

71.9 

135 

72.5 

175 

69.0 

16 

71.0 

56 

70.1 

96 

70.3 

136 

73.5 

176 

70.8 

17 

74.4 

57 

71.8 

97 

70.0 

137 

68.6 

177 

70.0 

18 

68.8 

58 

72.5 

98 

70.3 

138 

68.6 

178 

70.3 

19 

72.4 

59 

71.1 

99 

72.9 

139 

64.7 

179 

67.5 

20 

69.2 

60 

67.1 

100 

68.5 

140 

65.9 

180 

71.7 

21 

69.5 

61 

70.6 

101 

69.8 

141 

69.3 

181 

74.0 

22 

69.8 

62 

68.0 

102 

67.9 

142 

70.3 

182 

67.6 

23 

70.3 

63 

69.1 

103 

69.8 

143 

70.7 

183 

71.1 

24 

69.0 

64 

71.7 

104 

66.5 

144 

65.7 

184 

64.6 

25 

66.4 

65 

72.2 

105 

67.5 

145 

71.1 

185 

74.0 

26 

72.3 

66 

69.7 

106 

71.0 

146 

70.4 

186 

67.9 

27 

74.4 

67 

68.3 

107 

72.8 

147 

69.2 

187 

68.5 

28 

69.2 

68 

68.7 

108 

68.1 

148 

73.7 

188 

73.4 

29 

71.0 

69 

73.1 

109 

73.6 

149 

68.5 

189 

70.4 

30 

66.5 

70 

69.0 

no 

68.0 

150 

68.5 

190 

70.7 

31 

69.2 

71 

69.8 

111 

69.6 

151 

70.7 

191 

71.6 

32 

69.0 

72 

69.6 

112 

70.6 

152 

72.3 

192 

66.9 

33 

69.4 

73 

70.2 

113 

70.0 

153 

71.4 

193 

72.6 

34 

71.5 

74 

68.4 

114 

68.5 

154 

69.2 

194 

72.2 

35 

68.0 

75 

68.7 

115 

68.0 

155 

73.9 

195 

69.1 

36 

68.2 

76 

72.0 

116 

70.0 

156 

70.2 

196 

71.3 

37 

71.1 

77 

71.9 

117 

69.2 

157 

69.6 

197 

67.9 

38 

72.0 

78 

74.1 

118 

70.3 

158 

71.6 

198 

66.1 

39 

68.3 

79 

69.3 

119 

67.2 

159 

69.7 

199 

70.8 

40 

70.6 

80 

69.0 

120 

70.7 

160 

71.2 

200 

69.5 


given in Table 8.2 are six-year accident records of 7842 California drivers (Burg, 
1967, 1968). Based upon this set of observations, the histogram has the form given 
in Figure 8.2. The frequency diagram is obtained in this case simply by dividing 
the ordinate of the histogram by the total number of observations, which is 7842. 
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Returning now to the chemical yield example, the frequency diagram as 
shown in Figure 8.1 has the familiar properties of a probability density function 
(pdf). Hence, probabilities associated with various events can be estimated. For 
example, the probability of a batch having less than 68% yield can be read off 
from the frequency diagram by summing over the areas to the left of 68%, 
giving 0.13 (0.02 + 0.01 + 0.025 + 0.075). Similarly, the probability of a batch 
having yields greater than 72% is 0.18 (0.105 + 0.035 + 0.03 + 0.01). Let us 
remember, however, these are probabilities calculated based on the observed 
data. A different set of data obtained from the same chemical process would 
in general lead to a different frequency diagram and hence different values for 
these probabilities. Consequently, they are, at best, estimates of probabilities 
P{X < 68) and P{X > 72) associated with the underlying random variable X. 

A remark on the choice of the number of intervals for plotting the histograms 
and frequency diagrams is in order. For this example, the choice of 12 intervals is 
convenient on account of the range of values spanned by the observations and of 
the fact that the resulting resolution is adequate for calculations of probabilities 
carried out earlier. In Figure 8.3, a histogram is constructed using 4 intervals 
instead of 12 for the same example. It is easy to see that it projects quite a different, 
and less accurate, visual impression of data behavior. It is thus important to 
choose the number of intervals consistent with the information one wishes to 
extract from the mathematical model. As a practical guide, Sturges (1926) suggests 
that an approximate value for the number of intervals, k, be determined from 

k = 1 + 3.31ogio«, (8.1) 


where n is the sample size. 

From the modeling point of view, it is reasonable to select a normal distribution 
as the probabilistic model for percentage yield X by observing that its random vari¬ 
ations are the resultant of numerous independent random sources in the chem¬ 
ical manufacturing process. Whether or not this is a reasonable selection can be 


Table 8.2 Six-year accident record for 7842 
California drivers (data source: Burg, 1967, 1968) 


Number of accidents 

Number of drivers 

0 

5147 

1 

1859 

2 

595 

3 

167 

4 

54 

5 

14 

>5 

_6 

Total =7842 
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Number of accidents in six years 


Figure 8.2 Histogram from six-year accident data (data source: Burg, 1967, 1968) 



Percentage yieid 

Figure 8.3 Histogram for percentage yield with four intervals (data source: Hill, 1975) 
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evaluated in a subjective way by using the frequency diagram given in Figure 8.1. 
The normal density function with mean 70 and variance 4 is superimposed on the 
frequency diagram in Figure 8.1, which shows a reasonable match. Based on this 
normal distribution, we can calculate the probabilities given above, giving a further 
assessment of the adequacy of the model. For example, with the aid of Table A.3, 

P{X < 68) = = i^c/(-l) 

= l-^t/(l) 

= 0.159, 

which compares with 0.13 with use of the frequency diagram. 

In the above, the choice of 70 and 4, respectively, as estimates of the mean 
and variance of X is made by observing that the mean of the distribution should 
be close to the arithmetic mean of the sample, that is, 

( 8 . 2 ) 

./=i 

and the variance can be approximated by 

= . (8.3) 

7=1 

which gives the arithmetic average of the squares of sample values with respect 
to their arithmetic mean. 

Let us emphasize that our use of Equations (8.2) and (8.3) is guided largely 
by intuition. It is clear that we need to address the problem of estimating the param¬ 
eter values in an objective and more systematic fashion. In addition, procedures 
need to be developed that permit us to assess the adequacy of the normal model 
chosen for this example. These are subjects of discussion in the chapters to follow. 
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PROBLEMS 

8.1 It has been shown that the frequency diagram gives a graphical representation of the 
probability density function. Use the data given in Table 8.1 and construct a diagram 
that approximates the probability distribution function of percentage yield X. 

8.2 In parts (a)-(l) below, observations or sample values of size n are given for a random 
phenomenon. 

(i) If not already given, plot the histogram and frequency diagram associated with 
the designated random variable X. 

(ii) Based on the shape of these diagrams and on your understanding of the 
underlying physical situation, suggest one probability distribution (normal, 
Poisson, gamma, etc.) that may be appropriate for X. Estimate parameter 
value(s) by means of Equations (8.2) and (8.3) and, for the purposes of 
comparison, plot the proposed probability density function (pdf) or probabil¬ 
ity mass function (pmf) and superimpose it on the frequency diagram. 

(a) X is the maximum annual flood flow of the Feather River at Oroville, CA. 
Data given in Table 8.3 are records of maximum flood flows in 1000 cfs for 
the years 1902 to 1960 (source: Benjamin and Cornell, 1970). 

(b) X is the number of accidents per driver during a six-year time span in 
California. Data are given in Table 8.2 for 7842 drivers. 

(c) X is the time gap in seconds between cars on a stretch of highway. Table 8.4 
gives measurements of time gaps in seconds between successive vehicles at 
a given location (n = 100). 

(d) X is the sum of two successive gaps in Part (c) above. 

(e) X is the number of vehicles arriving per minute at a toll booth on New York 
State Thruway. Measurements of 105 one-minute arrivals are given in 
Table 8.5. 

(f) X is the number of five-minute arrivals in Part (e) above. 

(g) X is the amount of yearly snowfall in inches in Buffalo, NY. Given in Table 8.6 
are recorded snowfalls in inches from 1909 to 2002. 

(h) X is the peak combustion pressure in kPa per cycle. In spark ignition 
engines, cylinder pressure during combustion varies from cycle to cycle. 
The histogram of peak combustion pressure in kPa is shown in Figure 8.4 
for 280 samples (source: Chen and Krieger, 1976). 
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Table 8.3 Maximum flood flows (in 1000cfs), 1902-60 (source: 
Benjamin and Cornell, 1970). 


Year 

Flood 

Year 

Flood 

Year 

Flood 

1902 

42 

1922 

36 

1942 

110 

1903 

102 

1923 

22 

1943 

108 

1904 

118 

1924 

42 

1944 

25 

1905 

81 

1925 

64 

1945 

60 

1906 

128 

1926 

56 

1946 

54 

1907 

230 

1927 

94 

1947 

46 

1908 

16 

1928 

185 

1948 

37 

1909 

140 

1929 

14 

1949 

17 

1910 

31 

1930 

80 

1950 

46 

1911 

75 

1931 

12 

1951 

92 

1912 

16 

1932 

23 

1952 

13 

1913 

17 

1933 

9 

1953 

59 

1914 

122 

1934 

20 

1954 

113 

1915 

81 

1935 

59 

1955 

55 

1916 

42 

1936 

85 

1956 

203 

1917 

80 

1937 

19 

1957 

83 

1918 

28 

1938 

185 

1958 

102 

1919 

66 

1939 

8 

1959 

35 

1920 

23 

1940 

152 

1960 

135 

1921 

62 

1941 

84 




Table 8.4 Time gaps between vehicles (in seconds) 


4.1 

3.5 

2.2 

2.7 

2.7 

4.1 

3.4 

1.8 

3.1 

2.1 

2.1 

1.7 

2.3 

3.0 

4.1 

3.2 

2.2 

2.3 

1.5 

1.1 

2.5 

4.7 

1.8 

4.8 

1.8 

4.0 

4.9 

3.1 

5.7 

5.7 

3.1 

2.0 

2.9 

5.9 

2.1 

3.0 

4.4 

2.1 

2.6 

2.7 

3.2 

2.5 

1.7 

2.0 

2.7 

1.2 

9.0 

1.8 

2.1 

5.4 

2.1 

3.8 

4.5 

3.3 

2.1 

2.1 

7.1 

4.7 

3.1 

1.7 

2.2 

3.1 

1.7 

3.1 

2.3 

8.1 

5.7 

2.2 

4.0 

2.7 

1.5 

1.7 

4.0 

6.4 

1.5 

2.2 

1.2 

5.1 

2.7 

2.4 

1.7 

1.2 

2.7 

7.0 

3.9 

5.2 

2.7 

3.5 

2.9 

1.2 

1.5 

2.7 

2.9 

4.1 

3.1 

1.9 

4.8 

4.0 

3.0 

2.7 


(i) Xi,X 2 , and X^ are annual premiums paid by low-risk, medium-risk, and 
high-risk drivers. The frequency diagram for each group is given in Figure 8.5. 
(simulated results, over 50 years, are from Ferreira, 1974). 

(j) X is the number of blemishes in a certain type of image tube for television, 
58 data points are used for construction of the histogram shown in Figure 8.6. 
(source: Link, 1972). 

(k) X is the difference between observed and computed urinary digitoxin 
excretion, in micrograms per day. In a study of metabolism of digitoxin 
to digoxin in patients, long-term studies of urinary digitoxin excretion were 
carried out on four patients. A histogram of the difference between 
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Table 8.5 Arrivals per minute at a New York State Thru way toll booth 


9 

9 

11 

15 

6 

11 

9 

6 

11 

8 

10 

3 

9 

8 

5 

7 

15 

7 

14 

6 

6 

16 

6 

8 

10 

6 

10 

11 

9 

7 

7 

11 

10 

3 

8 

4 

7 

15 

6 

7 

7 

8 

7 

5 

13 

12 

11 

10 

8 

14 

3 

15 

13 

5 

7 

12 

7 

10 

4 

16 


7 

11 

11 

13 

10 

9 

10 

11 

6 

6 


8 

9 

5 

5 

5 

11 

6 

7 

9 

5 


12 

12 

4 

13 

4 

12 

16 

10 

14 

15 


16 

10 

8 

10 

6 

18 

13 

6 

9 

4 


13 

14 

6 

10 

10 


Table 8.6 

Annual snowfall, in inches, in Buffalo, NY, 1909 

'-2002 

Year 

Snowfall 

Year 

Snowfall 

Year 

Snowfall 

1909-1910 

126.4 

1939-1940 

77.8 

1969-1970 

120.5 

1910-1911 

82.4 

1940-1941 

79.3 

1970-1971 

97.0 

1911-1912 

78.1 

1941-1942 

89.6 

1971-1972 

109.9 

1912-1913 

51.1 

1942-1943 

85.5 

1972-1973 

78.8 

1913-1914 

90.9 

1943-1944 

58.0 

1973-1974 

88.7 

1914-1915 

76.2 

1944-1945 

120.7 

1974-1975 

95.6 

1915-1916 

104.5 

1945-1946 

110.5 

1975-1976 

82.5 

1916-1917 

87.4 

1946-1947 

65.4 

1976-1977 

199.4 

1917-1918 

110.5 

1947-1948 

39.9 

1977-1978 

154.3 

1918-1919 

25.0 

1948-1949 

40.1 

1978-1979 

97.3 

1919-1920 

69.3 

1949-1950 

88.7 

1979-1980 

68.4 

1920-1921 

53.5 

1950-1951 

71.4 

1980-1981 

60.9 

1921-1922 

39.8 

1951-1952 

83.0 

1981-1982 

112.4 

1922-1923 

63.6 

1952-1953 

55.9 

1982-1983 

52.4 

1923-1924 

46.7 

1953-1954 

89.9 

1983-1984 

132.5 

1924-1925 

72.9 

1954-1955 

84.6 

1984-1985 

107.2 

1925-1926 

74.6 

1955-1956 

105.2 

1985-1986 

114.7 

1926-1927 

83.6 

1956-1957 

113.7 

1986-1987 

67.5 

1927-1928 

80.7 

1957-1958 

124.7 

1987-1988 

56.4 

1928-1929 

60.3 

1958-1959 

114.5 

1988-1989 

67.4 

1929-1930 

79.0 

1959-1960 

115.6 

1989-1990 

93.7 

1930-1931 

74.4 

1960-1961 

102.4 

1990-1991 

57.5 

1931-1932 

49.6 

1961-1962 

101.4 

1991-1992 

92.8 

1932-1933 

54.7 

1962-1963 

89.8 

1992-1993 

93.2 

1933-1934 

71.8 

1963-1964 

71.5 

1993-1994 

112.7 

1934-1935 

49.1 

1964-1965 

70.9 

1994-1995 

74.6 

1935-1936 

103.9 

1965-1966 

98.3 

1995-1996 

141.4 

1936-1937 

51.6 

1966-1967 

55.5 

1996-1997 

97.6 

1937-1938 

82.4 

1967-1968 

66.1 

1997-1998 

75.6 

1938-1939 

83.6 

1968-1969 

78.4 

1998-1999 

100.5 





1999-2000 

63.6 





2000-2001 

158.7 





2001-2002 

132.4 
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Figure 8.4 Histogram for Problem 8.2(h) (source: Chen and Krieger, 1976) 
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Figure 8.5 Frequency diagrams for Problem 8.2(i) (source: Ferreira, 1974) 



observed and computed urinary digitoxin excretion in micrograms per day 
is given in Figure 8.7 (n = 100) (source: Jelliffe et al, 1970). 

(1) X is the live load in pounds per square feet (psf) in warehouses. The 
histogram in Figure 8.8 represents 220 measurements of live loads on 
different floors of a warehouse over bays of areas of approximately 400 
square feet (source: Dunham, 1952). 
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Figure 8.6 Histogram for Problem 8.2(j) (source: Link, 1972) 



Figure 8.7 Histogram for Problem 8.2(k) (source: Jelliffe etal 1970). 
Note: the horizontal axis shows the difference between the observed and 
computed urinary digitoxin excretion, in micrograms per day 
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Live load (psf) 

Figure 8.8 Flistogram for Problem 8.2(1) (source: Dunham, 1952) 
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Parameter Estimation 


Suppose that a probabilistic model, represented by probability density function 
(pdf) f{x), has been chosen for a physical or natural phenomenon for which 
parameters 61,62, ■■■ are to be estimated from independently observed data 
xi,X2, ■ ■ ■ ,x„. Let us consider for a moment a single parameter 6 for simphcity 
and write/(x; 6) to mean a specified probability distribution where 6 is the unknown 
parameter to be estimated. The parameter estimation problem is then one of 
determining an appropriate function of xi,X2, ■ ■ ■ ,x„, say h(xi,X2, ■ ■ ■ ,x„), which 
gives the ‘best’ estimate of 6. In order to develop systematic estimation procedures, 
we need to make more precise the terms that were defined rather loosely in the 
preceding chapter and introduce some new concepts needed for this development. 


9.1 SAMPLES AND STATISTICS 

Given an independent data set xi,X2, ■ ■ ■ ,x„, let 

6 = h{xi,X 2 ,...,Xn) (9.1) 

be an estimate of parameter 6. In order to ascertain its general properties, it is 
recognized that, if the experiment that yielded the data set were to be repeated, 
we would obtain different values for xi,X 2 , • ■. ,x„. The function h(xi,X2, ■ ■ ■ ,x„) 
when applied to the new data set would yield a different value for 6. We thus see 
that estimate 6 is itself a random variable possessing a probability distribution, 
which depends both on the functional form defined by h and on the distribution 
of the underlying random variable A. The appropriate representation of 6 is thus 

0 = /i(Ai,A2,...,A„), (9.2) 

where Ai,A 2 , ..., A„ are random variables, representing a sample from random 
variable X, which is referred to in this context as the population. In practically 
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all applications, we shall assume that sample Xi,X2,...,X„ possesses the 
following properties: 

• Property 1 : Xi,X2,... ,Xn are independent. 

• Property 2 : fxjix) = fx(x) for all x, j = 1 , 2 ,... ,n. 

The random variables X\,... ,X„ satisfying these conditions are called a random 
sample of size n. The word ‘random’ in this definition is usually omitted for the 
sake of brevity. If X is a random variable of the discrete type with probability 
mass function (pmf) px{x), then px/ix) = px(x) for each j. 

A specific set of observed values (xi,X2, ■ ■ ■ ,Xn) is a set of sample values 
assumed by the sample. The problem of parameter estimation is one class in 
the broader topic of statistical inference in which our object is to make infer¬ 
ences about various aspects of the underlying population distribution on the 
basis of observed sample values. For the purpose of clarification, the interre¬ 
lationships among X,{X\,X2,... ,X„), and (xi,X2, ■ ■. ,x„) are schematically 
shown in Figure 9 . 1 . 

Let us note that the properties of a sample as given above imply that certain 
conditions are imposed on the manner in which observed data are obtained. 
Each datum point must be observed from the population independently and 
under identical conditions. In sampling a population of percentage yield, as 
discussed in Chapter 8 , for example, one would avoid taking adjacent batches if 
correlation between them is to be expected. 

A statistic is any function of a given sample Xi,X2,... ,X„ that does not 
depend on the unknown parameter. The function h(Xi,X2, ■ ■. ,X„) in Equation 
( 9 . 2 ) is thus a statistic for which the value can be determined once the sample 
values have been observed. It is important to note that a statistic, being a function 
of random variables, is a random variable. When used to estimate a distribution 
parameter, its statistical properties, such as mean, variance, and distribution, give 
information concerning the quality of this particular estimation procedure. Cer¬ 
tain statistics play an important role in statistical estimation theory; these include 
sample mean, sample variance, order statistics, and other sample moments. Some 
properties of these important statistics are discussed below. 


X (population) 



X2 • • • X„ (sample) 


^1 X2 • • • Xn (sample values) 

Figure 9.1 Population, sample, and sample values 
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9.1.1 SAMPLE MEAN 
The statistic 



(9.3) 


is called the sample mean of population X. Let the population mean and 
variance be, respectively, 


E{X} = 1 

varlX} = (J^.j 


(9.4) 


The mean and variance of X, the sample mean, are easily found to be 


E{X}=-^E{Xi} = 

n •‘f—/ 

1=1 

and, owing to independence, 

varjJ} = E{(X - mf} = eI 


which is inversely proportional to sample size n. As n increases, the variance of X 
decreases and the distribution of A becomes sharply peaked at E{X} = m. Hence, 
it is intuitively clear that statistic X provides a good procedure for estimating 
population mean m. This is another statement of the law of large numbers that 
was discussed in Example 4.12 (page 96) and Example 4.13 (page 97). 

Since A is a sum of independent random variables, its distribution can also be 
determined either by the use of techniques developed in Chapter 5 or by means of 
the method of characteristic functions given in Section 4.5. We further observe 
that, on the basis of the central limit theorem (Section 7.2.1), sample mean A 
approaches a normal distribution as « ^ oo. More precisely, random variable 


- {nm) = m, 
n 


(9.5) 


i^(A,-m) 

Z=1 


(9.6) 


approaches N(0,1) as n ^ oo. 
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9.1.2 SAMPLE VARIANCE 
The statistic 



(9.7) 


is called the sample variance of population X. The mean of 5^ can be found by 
expanding the squares in the sum and taking termwise expectations. We first 
write Equation (9.7) as 


= 


/=1 

n r n 1 ^ 

/=! L ./=1 

=i -'")' - ^57^ - "KF - ">). 

1=1 ^ ^ I, J=\ 

¥J 

Taking termwise expectations and noting mutual independence, we have 


£{52} = ^2, (9.8) 

where m and are defined in Equations (9.4). We remark at this point that the 
reason for using l/(n — 1) rather than 1/n in Equation (9.7) is to make the mean 
of 52 equal to a^. As we shall see in the next section, this is a desirable property 
for 52 if it is to be used to estimate o^, the true variance of X. 

The variance of is found from 


var{52} = £{(52 - cr2)2}. 


(9.9) 


Upon expanding the right-hand side and carrying out expectations term by 
term, we find that 


var{52} 


1 

« 



« — 3 
« — 1 



(9.10) 


where 1 x 4 is the fourth central moment of X; that is, 

P 4 = E{{X-m)^}. (9.11) 

Equation (9.10) shows again that the variance of is an inverse function of n. 
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In principle, the distribution of can be derived with use of techniques 
advanced in Chapter 5. It is, however, a tedious process because of the complex 
nature of the expression for 5^ as defined by Equation (9.7). For the case in 
which population X is distributed according to N(m, cr^), we have the following 
result (Theorem 9.1). 

Theorem 9.1: Let 5^ be the sample variance of size n from normal population 
N(m,a^), then (n — l)S^/a^ has a chi-squared (x^) distribution with (n — 1) 
degrees of freedom. 

Proof of Theorem 9.1: the chi-squared distribution is given in Section 7.4.2. 
In order to sketch a proof for this theorem, let us note from Section 7.4.2 that 
random variable Y, 

^ = ( 9 . 12 ) 

1=1 

has a chi-squared distribution of n degrees of freedom since each term in the 
sum is a squared normal random variable and is independent of other random 
variables in the sum. Now, we can show that the difference between Y and 
(n — is 

^ 9 

Since the right-hand side of Equation (9.13) is a random variable having a chi- 
squared distribution with one degree of freedom. Equation (9.13) leads to the 
result that (n — is chi-squared distributed with (n — 1) degrees of freedom 

provided that independence exists between (n — l)S^/a^ and 

'] 

The proof of this independence is not given here but can be found in more 
advanced texts (e.g. Anderson and Bancroft, 1952). 






(9.13) 


9.1.3 SAMPLE MOMENTS 
The kth sample moment is 



(9.14) 
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Following similar procedures as given above, we can show that 


E{Mk} = at, 

1 , 

var{M;t} =-{a2k - a^), 
n 


where ak is the kih moment of population X. 


(9.15) 


9.1.4 ORDER STATISTICS 

A sample Aj, A 2 ,... ,X„ can be ranked in order of increasing numerical mag¬ 
nitude. Let A(i), A( 2 ), ..., A(„) be such a rearranged sample, where A(i) is the 
smallest and A(„) the largest. Then X^k) is called the kth-order statistic. Extreme 
values A(i) and A(„) are of particular importance in applications, and their 
properties have been discussed in Section 7.6. 

In terms of the probability distribution function (PDF) of population X, 
Fx(x), it follows from Equations (7.89) and (7.91) that the PDEs of A(i) and 
X(„) are 


(9.16) 

(9.17) 


If X is continuous, the pdfs of A(i) and X(„) are of the form [see Equations (7.90) 
and (7.92)] 


FxJx) = l-[l-Fx(x)r, 

Fxjx) = F"ix). 


(9.18) 

(9.19) 


The means and variances of order statistics can he obtained through integration, 
but they are not expressible as simple functions of the moments of population X. 


/x,„(x) = n[\ - Fxix)]" %{x), 
/x(„,W =nF'f\x)fx{x). 


9.2 QUALITY CRITERIA FOR ESTIMATES 

We are now in a position to propose a number of criteria under which the 
quality of an estimate can be evaluated. These criteria define generally desirable 
properties for an estimate to have as well as provide a guide hy which the 
quality of one estimate can be compared with that of another. 
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Before proceeding, a remark is in order regarding the notation to be used. As seen 
in Equation (9.2), our objective in parameter estimation is to determine a statistic 

0 = /t(jri,A2,...,jr„), (9.20) 

which gives a good estimate of parameter 0. This statistic will be called an 
estimator for 9, for which properties, such as mean, variance, or distribution, 
provide a measure of quality of this estimator. Once we have observed sample 
values xi,X 2 , ■ ■ ■ ,Xn, the observed estimator, 

0 = h{xi,X2,...,x„), (9.21) 

has a numerical value and will be called an estimate of parameter 0 . 


9.2.1 UNBIASEDNESS 

An estimator 0 is said to be an unbiased estimator for 0 if 


i?{0} = 9, 


(9.22) 


for all 0. This is clearly a desirable property for 0, which states that, on average, 
we expect 0 to be close to true parameter value 9. Let us note here that the 
requirement of unbiasedness may lead to other undesirable consequences. 
Hence, the overall quality of an estimator does not rest on any single criterion 
but on a set of criteria. 

Wehave studied two statistics, X and S^, in Sections 9.1.1 and 9.1.2. It is seen 
from Equations (9.5) and (9.8) that, if X and S^ are used as estimators for the 
population mean m and population variance a^, respectively, they are unbiased 
estimators. This nice property for S^ suggests that the sample variance defined 
by Equation (9.7) is preferred over the more natural choice obtained by repla¬ 
cing l/(n — 1) by 1/n in Equation (9.7). Indeed, if we let 

S^* (9.23) 


its mean is 


E{S^*} = ^ - 

n 

and estimator S'^* has a bias indicated by the coefficient in — \)ln. 
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9.2.2 MINIMUM VARIANCE 

It seems natural that, if 0 = hiX\,X 2 ,... ,2f„) is to qualify as a good estimator 
for 0, not only its mean should be close to true value 9 but also there should be a 
good probability that any of its observed values 6 will be close to 9. This can be 
achieved by selecting a statistic in such a way that not only is 0 unbiased but 
also its variance is as small as possible. Hence, the second desirable property is 
one of minimum variance. 

Definition 9.1. let 0 be an unbiased estimator for 9. It is an unbiased 
minimum-variance estimator for 9 if, for all other unbiased estimators 0* of 9 
from the same sample. 


var{0} < var{0*}, 


(9.24) 


for all 9. 

Given two unbiased estimators for a given parameter, the one with smaller 
variance is preferred because smaller variance implies that observed values of 
the estimator tend to be closer to its mean, the true parameter value. 

Example 9.1. Problem; we have seen that X obtained from a sample of size n 
is an unbiased estimator for population mean m. Does the quality of X improve 
as n increases? 

Answer: we easily see from Equation (9.5) that the mean of X is independent 
of the sample size; it thus remains unbiased as n increases. Its variance, on the 
other hand, as given by Equation (9.6) is 


2 

var{Z}=—, (9.25) 

n 

which decreases as n increases. Thus, based on the minimum variance criterion, 
the quality of X as an estimator for m improves as n increases. 

Example 9.2. Part 1. Problem: based on a fixed sample size n, is X the best 
estimator for m in terms of unbiasedness and minimum variance? 

Approach; in order to answer this question, it is necessary to show that the 
variance of X as given by Equation (9.25) is the smallest among all unbiased 
estimators that can be constructed from the sample. This is certainly difficult to 
do. However, a powerful theorem (Theorem 9.2) shows that it is possible to 
determine the minimum achievable variance of any unbiased estimator 
obtained from a given sample. This lower bound on the variance thus permits 
us to answer questions such as the one just posed. 
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Theorem9.2: the Cramer-Rao inequality. Let denote a sample 

of size n from a population X with pdf/(x;0), where B is the unknown param¬ 
eter, and let 0 = h(Xi,X 2 , ... ,X„) be an unbiased estimator for 6 . Then, the 
variance of 0 satisfies the inequality 


var{0} > 



' dinf{x-,e)Y 

de 


(9.26) 


if the indicated expectation and differentiation exist. An analogous result with 
p(X ; 9) replacing/(A; 9) is obtained when X is discrete. 

Proof of Theorem 9.2: the joint probability density function (jpdf) of Ai, A 2 ,..., 
and A„ is, because of their mutual independence,/(xi; 0)/(x2; 9) - ■ ■ /(x„; 9). The 
mean of statistic 0, 0 = h{X\,X 2 , ■ ■ ■, X„), is 

E{e} = E{h{XuX2,...,X„)}, 
and, since 0 is unbiased, it gives 

/ OO POO 

■■■ h{xi,...,x„)f{xi;9)---f{x„;9)dxi---dx„. (9.27) 

-OO J —OO 


Another relation we need is the identity: 

POO 

1= / f{xi;9)dxi, z= 1,2,...,«. 


(9.28) 


Upon differentiating both sides of each of Equations (9.27) and (9.28) with 
respect to 9, we have 


/ OO POO 

OO J — 


1 = 


h(x\,...,x„) 


1 df{xj-, 6 ) 


poo poo 


-OO J —OO 


0 = 


h(x\,. ..,x„) 
' Qf(xr,9) 


Uf^xj-e) de 

■^d\nf{xj;9) 

ih , 


/(xi; 0) • • -/{xn; 6 ) dx[ ■ ■ ■ dx„ 


(9.30) 


f{xi-9) ■ ■ •/(x„;6») dxi • --dxn, 


de 


dxi 


' 6 In/(x,; 61) 
d9 


(9.30) 


f{xr,9)dxi, ;■= 1,2,...,«. 
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Let us define a new random variable Y by 

_ dlnf{Xj; 9) 


(9.31) 


Equation (9.30) shows that 


E{ Y} = 0. 


Moreover, since E is a sum of n independent random variables, each with mean 
zero and variance it{[0 ln/(X; 0)/00]^}, the variance of Y is the sum of the n 
variances and has the form 


a 


2 

r 


= nE 


■01n/(3r;0) 

06» 



Now, it follows from Equation (9.29) that 

1 =£{ 07 }. 


(9.32) 


(9.33) 


Recall that 


£{07} = £{©}£{ 7} + p^ya^ay, 


or 


1 = 0(0) + pQyCr0Cry. 


(9.34) 


As a consequence of property < 1, we finally have 


1 , 


or, using Equation (9.32), 


<4^ = = 


nE'i 


01n/(A;0)' 


00 


(9.35) 


The proof is now complete. 

In the above, we have assumed that differentiation with respect to 0 under an 
integral or sum sign are permissible. Equation (9.26) gives a lower bound on the 
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variance of any unbiased estimator and it expresses a fundamental limitation 
on the accuracy with which a parameter can be estimated. We also note that 
this lower bound is, in general, a function of 6 , the true parameter value. 

Several remarks in connection with the Cramef-Rao lower bound (CRLB) 
are now in order. 


• Remark 1: the expectation in Equation (9.26) is equivalent to 
-E{d^ infix-, 6»)/06»2}, or 




nE 


06»2 



(9.36) 


This alternate expression offers computational advantages in some cases. 

• Remark 2: the result given by Equation (9.26) can be extended easily to 
multiple parameter cases. Let 61,62, ■■■, and 0„, (m <n) be the unknown 
parameters in /(x; 0 i,..., 0„,), which are to be estimated on the basis of a 
sample of size n. In vector notation, we can write 

d^=[ 6 , 62 ■■■ 6 „f (9.37) 

with corresponding vector unbiased estimator 

0 ^ = [0i 02 ••• 0„]. (9.38) 


Following similar steps in the derivation of Equation (9.26), we can show that 
the Cramef-Rao inequality for multiple parameters is of the form 

A-‘ 

cov{0} >-, (9.39) 

n 

where is the inverse of matrix A for which the elements are 


Kij = E 


01 n/(A;0)01n/(A;0) 


Wi 




ij =1,2, 


, m. 


(9.40) 


Equation (9.39) implies that 

var{0y}>^^- j=l,2,...,m, (9.41) 

n nAjj 

where (A^*)^y is the jjth element of A^*. 

• Remark 3: the CRLB can be transformed easily under a transformation of 
the parameter. Suppose that, instead of 6 , parameter (j) = g{ff) is of interest. 
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which is a one-to-one transformation and differentiable with respect to 6 ; 
then, 


CRLB for var{4} 


de 


[CRLB for var(0)], 


(9.42) 


where <f> is an unbiased estimator for (f). 

• Remark 4: given an unbiased estimator 0 for parameter 9, the ratio of its 
CRLB to its variance is called the efficiency of 0. The efficiency of any 
unbiased estimator is thus always less than or equal to 1. An unbiased 
estimator with efficiency equal to 1 is said to be efficient. We must point 
out, however, that efficient estimators exist only under certain conditions. 

We are finally in the position to answer the question posed in Example 9.2. 

Example 9.2. part 2. Answer: first, we note that, in order to apply the CRLB, 
pdf/(x;0) of population X must be known. Suppose that f{x',m) for this 
example is N(m, We have 


In f{X;m) = In 


= ln 


1 


(27r) 

1 


1 / 2 . 


(27r) 


1 / 2 . 


exp 


— {X — m) 


2a^ 


{X — nif' 

2tT2 ’ 


and 


Thus, 


d\nf{X]m) X — m 
dm 


d\nf{X-,m) 


n2" 


dm 




Equation (9.26) then shows that the CRLB for the variance of any unbiased 
estimator for m is (fin. Since the variance of X is cfin, it has the minimum 
variance among all unbiased estimators for m when population X is distributed 
normally. 

Example 9.3. Problem: consider a population X having a normal distribution 
N(0, af where is an unknown parameter to be estimated from a sample of 
size n > 1. (a) Determine the CRLB for the variance of any unbiased estimator 
for cr^. (b) Is sample variance 5^ an efficient estimator for cr^? 
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Answer: let us denote tr^ by 6 . Then, 




{lire) 


1/2 


exp 


~ 2 e 


and 


ln/(A;0) = -|^-iln27r0, 

01n/(A;6») 1 

W ~W~2£’ 

0^ ln/(Z; 9) 1 

002 

r02ln/(Z;0)) 0 1 1 

1 002 J 03^202 202- 

Hence, according to Equation (9.36), the CRLB for the variance of any 
unbiased estimator for 0 is 202/n. 

For 52 , it has been shown in Section 9.1.2 that it is an unbiased estimator for 
0 and that its variance is [see Equation (9.10)] 

var{5^} = - 

n \ n — \ J 


2 ( 7 ^ _ 202 

« — 1 n — 1 ’ 

since ^4 = 3a^ when X is normally distributed. The efficiency of S^, denoted by 
e(S^), is thus 


e{S^) 


CRLB 

var(5'2) 


« — 1 
n 


We see that the sample variance is not an efficient estimator for 0 in this 
case. It is, however, asymptotically efficient in the sense that e(S^) ^ 1 as 
n 00 . 
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Example 9.4. Problem: determine the CRLB for the variance of any unbiased 
estimator for 6 in the lognormal distribution 


f{x;9) = x(27r0)'/^ 

0, elsewhere. 


exp [ — In^ X ), for x > 0, and 9 > 0; 
V 20 


Answer: we have 


dlnf{X;9) _ 1 ^In^JT 

W 202 ’ 

0^ ln/(A; 9) 1 In^ A 


002 

dHnf{X;9) 

002 


202 03 ’ 

1 0 1 

W~¥~~W 


It thus follows from Equation (9.36) that the CRLB is 29^/n. 

Before going to the next criterion, it is worth mentioning again that, although 
unbiasedness as well as small variance is desirable it does not mean that we should 
discard all biased estimators as inferior. Consider two estimators for a parameter 0, 
01 and 02. the pdfs of which are depicted in Figure 9.2(a). Although 02 is biased, 
because of its smaller variance, the probability of an observed value of 02 being 
closer to the true value 0 can well be higher than that associated with an observed 
value of 01. Hence, one can argue convincingly that 02 is the better estimator of 
the two. A more dramatic situation is shown in Figure 9.2(b). Clearly, based on a 
particular sample of size n, an observed value of 02 will likely be closer to the true 
value 0 than that of 0i even though 0i is again unbiased. It is worthwhile for us to 
reiterate our remark advanced in Section 9.2.1 - that the quality of an estimator 
does not rest on any single criterion but on a combination of criteria. 

Example 9.5. To illustrate the point that unbiasedness can be outweighed by 
other considerations, consider the problem of estimating parameter 0 in the 
binomial distribution 


px{k) = 9\l - 9^-'^ , k = 0,l. (9.43) 

Let us propose two estimators, 0i and 02, for 0 given by 

01 = A, ] 

^ nX+\ \ (9.44) 
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Figure 9.2 Probability density functions of 0i and 02 


where X is the sample mean based on a sample of size n. The choice of 0i is 
intuitively obvious since E{X} — 8, and the choice of 02 is based on a prior 
probability argument that is not our concern at this point. 

Since 


and 


we have 


E{X} = 9, 




and 


E{ei} = e, 

Em=^^- 


1 


« + 2 ’ 


(9.45) 


2 ^( 1 -^) 
'"e. =-■ 




n 

*.2 


(« + 2 ) 


= ■ 


2 '"X 


n9{l - 0) 

in + 2f ' 


(9.46) 


We see from the above that, although 02 is a biased estimator, its variance is 
smaller than that of 0i, particularly when n is of a moderate value. This is 
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a valid reason for choosing 02 as a better estimator, compared with ©i, for 9, 
in certain cases. 


9.2.3 CONSISTENCY 

An estimator 0 is said to be a consistent estimator for 6 if, as sample size n 
increases. 


lim /’[|0 — 0| > e] = 0, 


(9.47) 


for all e > 0. The consistency condition states that estimator 0 converges in the 
sense above to the true value 6 as sample size increases. It is thus a large-sample 
concept and is a good quality for an estimator to have. 

Example 9.6. Problem: show that estimator S^ in Example 9.3 is a consistent 
estimator for a^. 

Answer: using the Chebyshev inequality defined in Section 4.2, we 
can write 

P{\S^-o^\>e}<\E{{S^-a^f}. 

We have shown that = cr^, and var{S^} = 20^/(« — 1). Hence, 


lim P{\S^ 

n—^oo 


> e} < lim 


72—»00 



= 0 . 


Thus S^ is a consistent estimator for cr^. 

Example 9.6 gives an expedient procedure for checking whether an estimator 
is consistent. We shall state this procedure as a theorem below (Theorem 9.3). It 
is important to note that this theorem gives a sufficient, but not necessary, 
condition for consistency. 

Theorem 9.3: Let 0 be an estimator for 6 based on a sample of size n. 
Then, if 


lim ii{0} = 9, and lim var{0} = 0, (9.48) 

72—»O0 72—»O0 

estimator 0 is a consistent estimator for 9. 

The proof of Theorem 9.3 is essentially given in Example 9.6 and will not be 
repeated here. 
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9.2.4 SUFFICIENCY 

Let X\,X 2 ,... ,Xn be a sample of a population X the distribution of which 
depends on unknown parameter 8. If Y = h{X\,X 2 ,... ,X„) is a statistic such 
that, for any other statistic 

Z = g{XuX2,...,X„), 

the conditional distribution of Z, given that Y = y does not depend on 8, then 
Y is called a sufficient statistic for 8. If also E{Y} = 8, then Y is said to he a 
sufficient estimator for 8. 

In words, the definition for sufficiency states that, if L is a sufficient statistic 
for 8, all sample information concerning 8 is contained in L. A sufficient 
statistic is thus of interest in that if it can he found for a parameter then an 
estimator based on this statistic is able to make use of all the information that 
the sample contains regarding the value of the unknown parameter. Moreover, 
an important property of a sufficient estimator is that, starting with any 
unbiased estimator of a parameter 8 that is not a function of the sufficient 
estimator, it is possible to find an unbiased estimator based on the sufficient 
statistic that has a variance smaller than that of the initial estimator. Sufficient 
estimators thus have variances that are smaller than any other unbiased esti¬ 
mators that do not depend on sufficient statistics. 

If a sufficient statistic for a parameter 8 exists, Theorem 9.4, stated here 
without proof, provides an easy way of finding it. 

Theorem 9.4: Fisher-Neyman factorization criterion. Let 

Y = h{XuX2,...,Xn) 

be a statistic based on a sample of size n. Then T is a sufficient statistic for 
8 if and only if the joint probability density function of X\,X 2 ,..., and 
^n,fx{xi;d) ■ ■ ■ fx(x„', 8), can be factorized in the form 


f\M Xj]8) = gi[h{xi,. . .,Xn),8]g2{xi,. . .,Xn). (9.49) 

./=1 

If X is discrete, we have 

n 

\{px{ Xj] 8 ) = gi[h{xi,... ,x„), 8 ]g 2 {xi,... ,x„). (9.50) 

,/=i 

The sufficiency of the factorization criterion was first pointed out by Fisher 
(1922). Neyman (1935) showed that it is also necessary. 
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The foregoing results can be extended to the multiple parameter case. Let 
= [6»i ... 6»„] , m < n,he the parameter vector. Then Tj — hi{Xi,... ,X „),..., 
Yr = hr{X \,..., X„), r > m, is a set of sufficient statistics for 6 if and only if 


Y[m Xj-,e) = gi[A(xi, ... ,x„),e]g 2 {xi,. .. ,x„), (9.51) 

./=i 


where ~ [h\ ■ ■ ■ hr]. A similar expression holds when X is discrete. 

Example 9.7. Let us show that statistic A is a sufficient statistic for 0 in 
Example 9.5. In this case, 


./=1 . 7=1 

= 


\-X 


(9.52) 


We see that the joint probability mass function (jpmf) is a function of Sx, and 
6 . If we let 

n 

Y = T.^.n 

. 7=1 


the jpmf of A 1 ,..., and X„ takes the form given by Equation (9.50), with 

= 6»^F (i _ 6»)”-s^7^ 

and 

g2 = 1- 


In this example, 

n 

, 7=1 

is thus a sufficient statistic for 9. We have seen in Example 9.5 that both 0i and 
02 > where 0i and 02 = (nA + l)/(n + 2), are based on this sufficient 

statistic. Furthermore, 0i, being unbiased, is a sufficient estimator for 9. 

Example 9.8. Suppose Ai,A 2 ,..., and X„ are a sample taken from a Poisson 
distribution; that is. 


Px{k;9) 


k\ 


/c= 0,1,2,..., 


(9.53) 
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where 9 is the unknown parameter. We have 


Wpx{xj]6) 

./=i 


QTiXj Q—n9 

Y\xj\ 


which can be factorized in the form of Equation (9.50) by letting 




-n 9 


and 


g2 


1 

rix,! 


(9.54) 


It is seen that 

n 

./=i 


is a sufficient statistic for 6 . 


9.3 METHODS OF ESTIMATION 

Based on the estimation criteria defined in Section 9.2, some estimation tech¬ 
niques that yield ‘good’, and sometimes ‘best’, estimates of distribution param¬ 
eters are now developed. 

Two approaches to the parameter estimation problem are discussed in what 
follows: point estimation and interval estimation. In point estimation, we use 
certain prescribed methods to arrive at a value for 0 as a function of the 
observed data that we accept as a ‘good’ estimate of 9 - good in terms of 
unbiasedness, minimum variance, etc., as defined by the estimation criteria. 

In many scientific studies it is more useful to obtain information about a 
parameter beyond a single number as its estimate. Interval estimation is a 
procedure by which bounds on the parameter value are obtained that not only 
give information on the numerical value of the parameter but also give an 
indication of the level of confidence one can place on the possible numerical 
value of the parameter on the basis of a sample. Point estimation will be 
discussed first, followed by the development of methods of interval estimation. 


9.3.1 POINT ESTIMATION 

We now proceed to present two general methods of finding point estimators for 
distribution parameters on the basis of a sample from a population. 
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9.3.1.1 Method of Moments 

The oldest systematic method of point estimation was proposed by Pearson 
(1894) and was extensively used by him and his co-workers. It was neglected for 
a number of years because of its general lack of optimum properties and 
because of the popularity and universal appeal associated with the method of 
maximum likelihood, to be discussed in Section 9.3.1.2. The moment method, 
however, appears to be regaining its acceptance, primarily because of its 
expediency in terms of computational labor and the fact that it can be improved 
upon easily in certain cases. 

The method of moments is simple in concept. Consider a selected probability 
density function /(x; 9 i, 62 , ■ ■ ■, 0^) for which parameters 9j, j = 1,2,... ,m, are 
to be estimated based on sample Xi,X 2 ,... ,X„ of 2f. The theoretical or popu¬ 
lation moments of X are 


ai= x‘f{x;9i,... ,9„,)dx, i=l,2,.... (9.55) 

J —OO 

They are, in general, functions of the unknown parameters; that is, 

ai = ai{9i,92,...,9„j). (9.56) 

However, sample moments of various orders can be found from the sample by 
[see Equation (9.14)] 



(9.57) 


The method of moments suggests that, in order to determine estimators 0],..., 
and 0m from the sample, we equate a sufficient number of sample moments to 
the corresponding population moments. By establishing and solving as many 
resulting moment equations as there are parameters to be estimated, estimators 
for the parameter are obtained. Hence, the procedure for determining 
01 , 02 , • • ■, and 0m consists of the following steps: 

• Step 1: let 






a;(0i,. 

II 

© 

1,2,.. 

., m. 


(9.58) 


These yield m moment equations in m unknowns 0y, j = I,... ,m. 

• Step 2: solve for 0/, j = I,... ,m, from this system of equations. These are 
called the moment estimators for 9i, ..., and 9^. 
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Let us remark that it is not necessary to consider m consecutive moment 
equations as indicated by Equations (9.58); any convenient set of m equations that 
lead to the solution for Qj,j = 1,... ,m, is sufficient. Lower-order moment equa¬ 
tions are preferred, however, since they require less manipulation of observed data. 

An attractive feature of the method of moments is that the moment equations 
are straightforward to establish, and there is seldom any difficulty in solving 
them. However, a shortcoming is that such desirable properties as unbiasedness 
or efficiency are not generally guaranteed for estimators so obtained. 

However, consistency of moment estimators can be established under general 
conditions. In order to show this, let us consider a single parameter 9 whose 
moment estimator 0 satisfies the moment equation 

a,(0) = Mi, (9.59) 

for some i. The solution of Equation (9.59) for 0 can be represented by 
0 = Q(Mi), for which the Taylor’s expansion about ai(9) gives 

0 = 0+ 0(1) [ai{9)] [Mi - ai{9)] + [m, - a,(0)]2 + ..., (9.60) 

where superscript (k) denotes the kih derivative with respect to M,. Upon 
performing successive differentiations of Equation (9.59) with respect to M,, 
Equation (9.60) becomes 


0 - 0 = [Mi 


ai{e)\ 


'Aai{9) 

d0 


\[Mi-ai{e)f 


■d^a,(0)' 

da,(0) 

d02 

d0 



(9.61) 


The bias and variance of 0 can be found by taking the expectation of 
Equation (9.61) and the expectation of the square of Equation (9.61), respect¬ 
ively. Up to the order of 1/n, we find 


var{e) = i(a„-af)(^) . 


> 


(9.62) 


Assuming that all the indicated moments and their derivatives exist. Equations 
(9.62) show that 


lim ii{0} = 0, 

»oo 


TLFeBOOK 



280 


Fundamentals of Probability and Statistics for Engineers 


and 


lim var{0} = 0, 

«—»O0 


and hence 0 is consistent. 

Example 9.9. Problem: let us select the normal distribution as a model for the 
percentage yield discussed in Chapter 8; that is, 


f{x;m,a'^) 


1 

(27r)*^^(j 


exp 


(x — mf' 
2a2 ’ 


— OO < X < oo. 


(9.63) 


Estimate parameters 9\ = m, and Oj = cr^, based on the 200 sample values given 
in Table 8.1, page 249. 

Answer: following the method of moments, we need two moment equations, 
and the most convenient ones are obviously 


cti = M\ = X, 


and 


(X2 — M2’ 


Hence, the first of these moment equations gives 

— 1 " 

0i=X = -^Ay. (9.64) 

"f=i 

The properties of this estimator have already been discussed in Example 9.2. It 
is unbiased and has minimum variance among all unbiased estimators for m. 
We see that the method of moments produces desirable results in this case. 
The second moment equation gives 

02 + 02 = M2 = -VX?, 

n “ ^ 
y=i 

or 

Q^ = M2-M\ = -y{Xj-Xf. (9.65) 

n ■‘f—/ 
y=i 

This, as we have shown, is a biased estimator for a^. 
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Estimates 9\ and 62 of = m and O 2 = cP- based on the sample values given 
by Table 8.1 are, following Equations (9.64) and (9.65), 

1 200 

= y"x,-^7o, 


200 ' 

1 

= 200 ^-'' 


,/=l 

200 


./=1 


where xj, j = 1,2,, 200, are sample values given in Table 8.1. 

Example 9.10. Problem: consider the binomial distribution 

px{k]p)=p'^{l-py^'", k = 0,\. (9.66) 

Estimate parameter p based on a sample of size n. 

Answer: the method of moments suggests that we determine the estimator for 
p,P, by equating ai to Mi — X. Since 


cti = E{X} = p, 


we have 


P = X. (9.67) 

The mean of P is 

E{P} = \'^E{Xj}=p. (9.68) 

./=i 

Hence it is an unbiased estimator. Its variance is given by 

varjP} = varj A} = = ( 9 . 69 ) 

n n 

It is easy to derive the CRLB for this case and show that P defined by Equation 
(9.67) is also efficient. 

Example 9.11. Problem: a set of 214 observed gaps in traffic on a section of 
Arroyo Seco Freeway is given in Table 9.1. If the exponential density function 

/(t; A) = Ae--'", t > 0, (9.70) 

is proposed for the gap, determine parameter A from the data. 
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Table 9.1 Observed traffic gaps on Arroyo Seco Freeway, 
for Example 9.11 (Source: Gerlough, 1955) 


Gap length (s) 

Gaps (No.) 

Gap length (s) 

Gaps (No.) 

0-1 

18 

16-17 

6 

1-2 

25 

17-18 

4 

2-3 

21 

18-19 

3 

3^ 

13 

19-20 

3 

4-5 

11 

20-21 

1 

5-6 

15 

21-22 

1 

6-7 

16 

22-23 

1 

7-8 

12 

23-24 

0 

8-9 

11 

24-25 

1 

9-10 

11 

25-26 

0 

10-11 

8 

26-27 

1 

11-12 

12 

27-28 

1 

12-13 

6 

28-29 

1 

13-14 

3 

29-30 

2 

14-15 

3 

30-31 

1 

15-16 

3 




Answer: in this case, 


ai 


1 

A’ 


and, following the method of moments, the simplest estimator, A, for A is 
obtained from 

ai=X, or A = L. (9.71) 

Hence, the desired estimate is 

214 (9-72) 

“ 18(0.5)+25(1.5) +••• + 1(30.5) 

= 0.13s-‘. 


Let us note that, although X is an unbiased estimator for cti, the estimator 
for A obtained above is not unbiased since 



^ E{X} ■ 


TLFeBOOK 



Parameter Estimation 


283 


Example 9.12. Suppose that population X has a uniform distribution over the 
range (0,6 ) and we wish to estimate parameter 6 from a sample of size n. 

The density function of X is 


fix; 0) 


for 0 < X < 0; 

U 

0 , elsewhere; 


and the first moment is 


Oil 


e 

2 ' 


(9.73) 


(9.74) 


It follows from the method of moments that, on letting ai = X, we obtain 




y=i 


(9.75) 


Upon little reflection, the validity of this estimator is somewhat questionable 
because, by definition, all values assumed by X are supposed to lie within 
interval (0,0). However, we see from Equation (9.75) that it is possible that 
some of the samples are greater than 0. Intuitively, a better estimator might be 

e = X(„), (9.76) 

where X(„) is the nth-order statistic. As we will see, this would be the outcome 
following the method of maximum likelihood, to be discussed in the next 
section. 

Since the method of moments requires only a,, the moments of population X, 
the knowledge of its pdf is not necessary. This advantage is demonstrated in 
Example 9.13. 

Example 9.13. Problem: consider measuring the length r of an object with use 
of a sensing instrument. Owing to inherent inaccuracies in the instrument, what 
is actually measured is X, as shown in Eigure 9.3, where Xi and X 2 are 
identically and normally distributed with mean zero and unknown variance 
(7^. Determine a moment estimator 0 for 0 = on the basis of a sample of size 
n from X. 

Answer: now, random variable X is 

X =[{r + Xif + (9.77) 

The pdf of X with unknown parameters 0 and can be found by using 
techniques developed in Chapter 5. It is, however, unnecessary here since some 
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X_— 


X2 

- r-► 

Xi 



Figure 9.3 Measurement X, for Example 9.13 

moments ofX can be directly generated from Equation (9.77). We remark that, 
although an estimator for is not required, it is nevertheless an unknown 
parameter and must be considered together with 0. In the applied literature, an 
unknown parameter for which the value is of no interest is sometimes referred 
to as a nuisance parameter. 

Two moment equations are needed in this case. However, we see from 
Equation (9.77) that the odd-order moments of X are quite complicated. For 
simplicity, the second-order and fourth-order moment equations will be used. 
We easily obtain from Equation (9.77) 


a 2 = 0 + IcP', 

= 9-+ + 8cr^ 


(9.78) 


The two moment equations are 

0-F2S2 = M2, 1 

02 -F 805^2 -F 81^2 = M 4 . J 


(9.79) 


Solving for 0, we have 

0= (2M2-M4)‘^2_ (9 80) 

Incidentally, a moment estimator T? for if needed, is obtained from Equa¬ 
tions (9.79) to be 


S2=1(M2-0). (9.81) 

Combined Moment Estimators. Let us take another look at Example 9.11 for 
the purpose of motivating the following development. In this example, an 
estimator for A has been obtained by using the first-order moment equation. 
Based on the same sample, one can obtain additional moment estimators for 
A by using higher-order moment equations. For example, since a 2 = 2/A2, the 
second-order moment equation, 


q;2 — AT 2 , 
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produces a moment estimator A for A in the form 



(9.82) 


Although this estimator may he inferior to HX in terms of the quality criteria 
we have established, an interesting question arises: given two or more moment 
estimators, can they be combined to yield an estimator superior to any of the 
individual moment estimators? 

In what follows, we consider a combined moment estimator derived from an 
optimal linear combination of a set of moment estimators. Let 0®,..., 
0*^^ be p moment estimators for the same parameter 0. We seek a combined 
estimator 0* in the form 

0* = wi0(‘) + --- + w^0(^), (9.83) 

where coefficients wi,..., and Wp are to be chosen in such a way that it is 
unbiased if 09?, j = 1,2,... ,p, are unbiased and the variance of 0* is minimized. 

The unbiasedness condition requires that 

w\ + ■ ■ ■ + Wp = 1. (9.84) 

We thus wish to determine coefficients vv, by minimizing 

e = var{0;} = var|^M;y09?|, (9.85) 

subject to Equation (9.84). 

Let = [1 ... 1], ©T = [0(1) ... ©(/’)], and = [wi ... Wp\. 

Equations (9.84) and (9.85) can be written in the vector-matrix form 

w'^u=l, (9.86) 


and 


Q(w) = varj^ wy09? | = w’^'An-, (9.87) 

where A = [A,y] with Ay = cov{09), 09?}. 

In order to minimize Equation (9.87) subject to Equation (9.86), we consider 

Qi(h’) = hw — w^u\ — \u^w (9.88) 
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The variance of 0* is 

var{0;} = w^kw = , (9.93) 

A H 

in view of Equation (9.92). 

Several attractive features are possessed by 0*. For example, we can show 
that its variance is smaller than or equal to that of any of the simple moment 
estimators = 1,2,... ,p, and furthermore (see Soong, 1969), 

var{0;} < var{0;}, (9.94) 

if f 

Example 9.14. Consider the problem of estimating parameter 9 in the log¬ 
normal distribution 

f{x]9)= ^ exp -;^ln^x , x > 0, 6» > 0, (9.95) 

x(27r6»)‘/^ L 261 J 

from a sample of size n. 
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Figure 9.4 Efficiencies of estimators in Example 9.14 as « ^ oo 


Three moment estimators for 9 - and 0*^^^ - can be found by means 

of establishing and solving the first three moment equations. Let 02 be the 
combined moment estimator of 0*^^^ and 0*^\ and let 03 be the combined 
estimator of all three. As we have obtained the CRLB for the variance of any 
unbiased estimator for 9 in Example 9.4, the efficiency of each of the above 
estimators can be calculated. Figure 9.4 shows these efficiencies as n ^ oo. As 
we can see, a significant increase in efficiency can result by means of combining 
even a small number of moment estimators. 


9.3.1.2 Method of Maximum Likelihood 

First introduced by Fischer in 1922, the method of maximum likelihood has 
become the most important general method of estimation from a theoretical 
point of view. Its greatest appeal stems from the fact that some very general 
properties associated with this procedure can be derived and, in the case of 
large samples, they are optimal properties in terms of the criteria set forth in 
Section 9.2. 
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Let/(x; 9) be the density function of population X where, for simplicity, 9 is 
the only parameter to be estimated from a set of sample values xi,X 2 , ■ ■ ■ ,x„. 
The joint density function of the corresponding sample X 1 ,^ 2 ,... ,Xn has the 
form 


/(xi;6»)/(x 2;6») • • •/(x„;6»). 


We define the likelihood function L of a set of n sample values from the 
population by 


L(xi, X2,...,x„; 6») =/(xi; 6 ')/(x2; 9) ■ ■ -/{xn] 9). 


(9.96) 


In the case when X is discrete, we write 


L(xi, X2,..., x„; 6») = p{xi ; 9)p{x2; 9) ■ ■ ■ p{x„-, 9). 


(9.97) 


When the sample values are given, likelihood function L becomes a function 
of a single variable 9. The estimation procedure for 9 based on the method of 
maximum likelihood consists of choosing, as an estimate of 9, the particular 
value of 9 that maximizes L. The maximum of L(9) occurs in most cases at the 
value of 9 where dL(9)ld9 is zero. Hence, in a large number of cases, the 
maximum likelihood estimate (MLE) 9 of 9 based on sample values xi,X 2 , ■.., 
and Xn can be determined from 

dL(xi,X2,...,x„;g) ^ ^ 

d9 

As we see from Equations (9.96) and (9.97), function L is in the form of a 
product of many functions of 9. Since L is always nonnegative and attains its 
maximum for the same value of 9 as InL, it is generally easier to obtain MLE 
9 by solving 


dlnL(xi,...,Xn;^) 

d9 


(9.99) 


because In L is in the form of a sum rather than a product. 

Equation (9.99) is referred to as the likelihood equation. The desired solution 
is one where root 0 is a function of Xj,] = 1, 2,..., n, if such a root exists. When 
several roots of Equation (9.99) exist, the MLE is the root corresponding to the 
global maximum of L or In L. 
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To see that this procedure is plausible, we observe that the quantity 
L{x\0)dxidx2 ■ ■ ■ dx„ 

is the probability that sample Xi,X 2 ,... ,Xn takes values in the region defined 
by {x\ + dxi, v :2 + dx 2 ,..., x„ + dx„). Given the sample values, this probability 
gives a measure of likelihood that they are from the population. By choosing 
a value of 9 that maximizes L, or InL, we in fact say that we prefer the value of 
6 that makes as probable as possible the event that the sample values indeed 
come from the population. 

The extension to the case of several parameters is straightforward. In the case 
of m parameters, the likelihood function becomes 

Z,(xi,..., x^, ,..., 

and the MLEs of OjJ = 1,... ,m, are obtained by solving simultaneously the 
system of likelihood equations 

^^^ = 0, i=l,2,...,m. (9.100) 

o9j 

A discussion of some of the important properties associated with a maximum 
likelihood estimator is now in order. Let us represent the solution of the like¬ 
lihood equation, Equation (9.99), by 

0 = /z(xi,X2,...,x„). (9.101) 

The maximum likelihood estimator 0 for 9 is then 

e = h{XuX2,...,X„). (9.102) 


The universal appeal enjoyed by maximum likelihood estimators stems from 
the optimal properties they possess when the sample size becomes large. Under 
mild regularity conditions imposed on the pdf or pmf of population X, two 
notable properties are given below, without proof. 

Property 9.1: consistency and asymptotic efficiency. Let 0 be the maximum 
likelihood estimator for 9 in pdf/(x; 9) on the basis of a sample of size n. Then, 
as « ^ oo, 

£■{0} ^ 9, (9.103) 

and 


var{0} 



- d\nf{X-,9) Y 

d9 


-1 


(9.104) 
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Analogous results are obtained when population X is discrete. Furthermore, 
the distribution of 0 tends to a normal distribution as n becomes large. 

This important result shows that MLE 0 is consistent. Since the variance 
given by Equation (9.104) is equal to the Cramer-Rao lower bound, it is 
efficient as n becomes large, or asymptotically efficient. The fact that MLE 0 
is normally distributed as n ^ oo is also of considerable practical interest as 
probability statements can be made regarding any observed value of a max¬ 
imum likelihood estimator as n becomes large. 

Let us remark, however, these important properties are large-sample proper¬ 
ties. Unfortunately, very little can be said in the case of a small sample size; it 
may be biased and nonefficient. This lack of reasonable small-sample proper¬ 
ties can be explained in part by the fact that maximum likelihood estimation is 
based on finding the mode of a distribution by attempting to select the true 
parameter value. Estimators, in contrast, are generally designed to approach 
the true value rather than to produce an exact hit. Modes are therefore not as 
desirable as the mean or median when the sample size is small. 

Property 9.2: invariance property. It can be shown that, if 0 is the MLE of 9, 
then the MLE of a function of 9, say g(9), is g(0), where g(9) is assumed to 
represent a one-to-one transformation and be differentiable with respect to 9. 

This important invariance property implies that, for example, if S is the 
MLE of the^tandard deviation cr in a distribution, then the MLE of the 
variance Y?, is T?. 

Let us also make an observation on the solution procedure for solving like¬ 
lihood equations. Although it is fairly simple to establish Equation (9.99) or 
Equations (9.100), they are frequently highly nonlinear in the unknown estimates, 
and close-form solutions for the MLE are sometimes difficult, if not impossible, 
to achieve. In many cases, iterations or numerical schemes are necessary. 

Example 9.15. Let us consider Example 9.9 again and determine the MLEs of 
m and ff. The logarithm of the likelihood function is 

1 " 1 1 

InL = — — nif' — -nlntr^ — -Kln27r. (9.105) 

7=1 

Let 9\ = m, and 02 = <7^. as before; the likelihood equations are 


01nL 

001 




^2 y=i 


01nL 


^^{xj-9,Y 
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Solving the above equations simultaneously, the MLEs of m and tr^ are found 
to be 


1 " 

01 =-Vx,-, 
./=! 


and 


7=1 


The maximum likelihood estimators for m and are, therefore, 


01 = 
y=i 

./=1 


> 



n 


(9.106) 


which coincide with their moment estimators in this case. Although 02 is 
biased, consistency and asymptotic efficiency for both 0i and 02 can be easily 
verified. 

Example 9.16. Let us determine the MLE of 0 considered in Example 9.12. 
Now, 


f(x; 0) 


for 0 < X < 0; 
(7 

0, elsewhere. 


The likelihood function becomes 


E(xi,X2,...,x„;0) 


, 0 < X,' < 0, for all i. 


(9.107) 


(9.108) 


A plot of L is given in Figure 9.5. However, we note from the condition 
associated with Equation (9.108) that all sample values x,- must be smaller than 
or equal to 0, implying that only the portion of the curve to the right of 
max(xi,... ,x„) is applicable. Hence, the maximum of L occurs at 
0 = max(xi,X 2 ,.. ■ ,x„), or, the MLE for 0 is 

0 = max(xi,X 2 ,... ,x„), (9.109) 
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\ 



e 


Max(xi,..., Xn) 


Figure 9.5 Likelihood function, L{6), for Example 9.16 


and the maximum likelihood estimator for 6 is 


0 = max(Xi,X2, ...,X„) = X(„). 


(9.110) 


This estimator is seen to he different from that obtained by using the moment 
method [Equation (9.75)] and, as we already commented in Example 9.12, it is 
a more logical choice. 

Let us also note that we did not obtain Equation (9.109) by solving the 
likelihood equation. The likelihood equation does not apply in this case as the 
maximum of L occurs at the boundary and the derivative is not zero there. 

It is instructive to study some of the properties of 0 given by Equation 
(9.110). The pdf of 0 is given by [see Equation (9.19)] 


/^(x) = \x)fx{x). 


(9.111) 


With/x(x) given by Equation (9.107) and 



0, for X < 0; 

^, for 0 < X < 0; 

U 


1, for X > 9; 


(9.112) 


we have 



—;—, for 0 < X < 6>; 

9 " - - . 

0, elsewhere. 


(9.113) 
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The mean and variance of 0 are 


^{0} 


var{0} 


n 

/ T/eWdx = —— 6», 

7o « + 1 


(« + 1) (n + 2) 


(9.114) 

(9.115) 


We see that 0 is biased but consistent. 

Example 9.17. Let us now determine the MLE of 0 = in Example 9.13. To 
carry out this estimation procedure, it is now necessary to determine the pdf of 
X given by Equation (9.77). Applying techniques developed in Chapter 5, we 
can show that X is characterized by the Rice distribution with pdf given by (see 
Benedict and Soong, 1967) 


fx{x;e,a^) 




exp 


2tT2 


for X > 0; 


0, elsewhere; 


(9.116) 


where Iq is the modified zeroth-order Bessel function of the first kind. 

Given a sample of size n from population X, the likelihood function takes the 
form 


L = p/x(x,;6l,cr^). 

7=1 

The MLEs of 6 and cr^, 9 and satisfy the likelihood equations 

0 In L ^ , 0 In L 

—^ = 0, and = 0, 

00 0(t2 

which, upon simplifying, can be written as 


I _ Xyli (yj) J ^ 

n0*/2 ^ lo(jy) 


and 


1 I 




7=1 


(9.117) 


(9.118) 


(9.119) 


(9.120) 
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where Ij is the modified first-order Bessel function of the first kind, and 


yj = 



(9.121) 


As we can see, although likelihood equations can be established, they are 
complicated functions of 9 and cr^, and we must resort to numerical means for 
their solutions. As we have pointed out earlier, this difficulty is often encoun¬ 
tered when using the method of maximum likelihood. Indeed, Example 9.13 
shows that the method of moments offers considerable computational advan¬ 
tage in this case. 

The variances of the maximum likelihood estimators for 9 and can be 
obtained, in principle, from Equations (9.119) and (9.120). We can also show 
that their variances can be larger than those associated with the moment 
estimators obtained in Example 9.13 for moderate sample sizes (see Benedict 
and Soong, 1967). This observation serves to remind us again that, although 
maximum likelihood estimators possess optimal asymptotic properties, they 
may perform poorly when the sample size is small. 


93.2 INTERVAL ESTIMATION 

We now examine another approach to the problem of parameter estimation. As 
stated in the introductory text of Section 9.3, the interval estimation provides, 
on the basis of a sample from a population, not only information on the 
parameter values to be estimated, but also an indication of the level of con¬ 
fidence that can be placed on possible numerical values of the parameters. 
Before developing the theory of interval estimation, an example will be used 
to demonstrate that a method that appears to be almost intuitively obvious 
could lead to conceptual difficulties. 

Suppose that five sample values -3, 2, 1.5, 0.5, and 2.1 - are observed from a 
normal distribution having an unknown mean m and a known variance = 9. 
From Example 9.15, we see that the MLE of m is the sample mean X and thus 

m = ^(3-F2-F 1.5-F0.5-F2.1) = 1.82. (9.122) 

Our additional task is to determine the upper and lower limits of an interval 
such that, with a specified level of confidence, the true mean m will lie in this 
interval. 

The maximum likelihood estimator for m is X, which, being a sum of 
normal random variables, is normal with mean m and variance a^jn = 9/5. 
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The standardized random variable U, defined by 


(9.123) 


is then N(0,1) and it has pdf 


.fv{u) 


1 




—oo < u < oo. 


(9.124) 


Suppose we specify that the probability of U being in interval (—mi, mj) is equal 
to 0.95. From Table A.3 we find that ui = 1.96 and 

^ 1.96 

F(-1.96 < {/< 1.96) = / /c7(M)dM = 0.95, (9.125) 

4 - 1.96 

or, on substituting Equation (9.123) into Equation (9.125), 

P(A-2.63<m<r+2.63) = 0.95, (9.126) 

and, using Equation (9.122), the observed interval is 


E(-0.81 <m< 4.45) = 0.95. 


(9.127) 


Equation (9.127) gives the desired result but it must be interpreted carefully. 
The mean m, although unknown, is nevertheless deterministic; and it either lies 
in an interval or it does not. However, we see from Equation (9.126) that the 
interval is a function of statistic X. Hence, the proper way to interpret Equa¬ 
tions (9.126) and (9.127) is that the probability of the random interval 
(X — 2.63, X + 2.63) covering the distribution’s true mean m is 0.95, and Equa¬ 
tion (9.127) gives the observed interval based upon the given sample values. 

Let us place the concept illustrated by the example above in a more general 
and precise setting, through Definition 9.2. 

Definition 9.2. Suppose that a sample Ai,A 2 , ... ,X„ is drawn from a popula¬ 
tion having pdf /(x; 9), 9 being the parameter to be estimated. Eurther suppose 
that Li(X\,... ,X„) and L 2 (Xi, ..., A„) are two statistics such that Li < L 2 with 
probability 1. The interval (Li,L 2 ) is called a [100(1 — a)]% confidence interval 
for 9 if L\ and L 2 can be selected such that 

P(Li < 6» < L 2 ) = 1-a. (9.128) 

Limits L\ and L 2 are called, respectively, the lower and upper confidence limits 
for 9, and 1 — a is called the confidence coefficient. The value of 1 — a is 
generally taken as 0.90, 0.95, 0.99, and 0.999. 
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We now make several remarks concerning the foregoing definition. 

• Remark 1: as we see from Equation (9.126), confidence limits are functions of 
a given sample. The confidence interval thus will generally vary in position 
and width from sample to sample. 

• Remark 2: for a given sample, confidence limits are not unique. In other 
words, many pairs of statistics Li and L 2 exist that satisfy Equation (9.128). 
Eor example, in addition to the pair (—1.96, 1.96), there are many other pairs 
of values (not symmetric about zero) that could give the probability 0.95 in 
Equation (9.125). However, it is easy to see that this particular pair gives the 
minimum-width interval. 

• Remark 3: in view of the above, it is thus desirable to define a set of quality 
criteria for interval estimators so that the ‘best’ interval can be obtained. 
Intuitively, the ‘best’ interval is the shortest interval. Moreover, since interval 
width L = L 2 — Ti is a random variable, we may like to choose ‘minimum 
expected interval width’ as a good criterion. Unfortunately, there may not 
exist statistics L\ and L 2 that give rise to an expected interval width that is 
minimum for all values of 6. 

• Remark 4: just as in point estimation, sufficient statistics also play an 
important role in interval estimation, as Theorem 9.5 demonstrates. 

Theorem 9.5: let Li and L 2 be two statistics based on a sample X\,... ,Xn 
from a population X with pdf/(x;0) such that P(L\ < 9 < L 2 ) =1—0;. Let 
Y = h{X\,... ,X„) be a sufficient statistic. Then there exist two functions Ri 
and R 2 of T such that P{R\ < 9 < R 2 ) = I — a and such that two interval 
widths L ~ L 2 — Li and R = R 2 — R\ have the same distribution. 

This theorem shows that, if a minimum interval width exists, it can be 
obtained by using functions of sufficient statistics as confidence limits. 

The construction of confidence intervals for some important cases will be carried 
out in the following sections. The method consists essentially of finding an appro¬ 
priate random variable for which values can be calculated on the basis of observed 
sample values and the parameter value but for which the distribution does not 
depend on the parameter. More general methods for obtaining confidence inter¬ 
vals are discussed in Mood (1950, chapter 11) and Wilks (1962, chapter 12). 


9.3.2.1 Confidence Interval for m in N(»i, cr^) with Known cr^ 

The confidence interval given by Equation (9.126) is designed to estimate the 
mean of a normal population with known variance. In general terms, the 
procedure shows that we first determine a (symmetric) interval in U to achieve 
a confidence coefficient of 1 — a. Writing Ua /2 for the value of U above which 
the area under/ j/(m) is a/2, that is, P(U > Uaji) = ci/2 (see Eigure 9.6), we have 
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fuiu) 



ail 


U 


Figure 9.6 [100(1 — a)]% confidence limits for U 


-P(-Mq/2 < U < Ua/ 2 ) = 1 - a. 


(9.129) 


Hence, using the transformation given by Equation (9.123), we have the general 
result 



(9.130) 


This result can also be used to estimate means of nonnormal populations with 
known variances if the sample size is large enough to justify use of the central 
limit theorem. 

It is noteworthy that, in this case, the position of the interval is a function of 
X and therefore is a function of the sample. The width of the interval, in 
contrast, is a function only of sample size n, being inversely proportional to 

The [100(1 — a)] % confidence interval for m given in Equation (9.130) also 
provides an estimate of the accuracy of our point estimator X for m. As we see 
from Figure 9.7, the true mean m lies within the indicated interval with 
[100(1 — a)] % confidence. Since X is at the center of the interval, the distance 




X 


m 



Figure 9.7 Error in point estimator X for m 
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between X and m can be at most equal to one-half of the interval width. We 
thus have the result given in Theorem 9.6. 

Theorem 9.6: let X be an estimator for m. Then, with [100(1 — a)]% con¬ 
fidence, the error of using this estimator for m is less than 

Ua/2^ 

-fjUr 

Example 9.18. Problem: let population X be normally distributed with 
known variance a^. If X is used as an estimator for mean m, determine the 
sample size n needed so that the estimation error will be less than a specified 
amount s with [100(1 — a)] % confidence. 

Answer: using the theorem given above, the minimum sample size n must 
satisfy 

Ua/2^ 


Hence, the solution for n is 



(9.131) 


9.3.2.2 Confidence Interval for m in N(»i, a^) with Unknown 

The difference between this problem and the preceding one is that, since a is not 
known, we can no longer use 

as the random variable for confidence limit calculations regarding mean m. Let 
us then use sample variance 5^ as an unbiased estimator for cP' and consider the 
random variable 

l'" (9.132) 

The random variable Y is now a function of random variables X and S. In 
order to determine its distribution, we first state Theorem 9.7. 

Theorem 9.7: SfiK/cnf’s t-distribution. Consider a random variable T defined by 

/K\ 

T=ui-j . (9.133) 
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If U is N(0,1), V is ;^^-distributed with n degrees of freedom, and U and V are 
independent, then the pdf of T has the form 


frit) 


r[(« + i)/ 2 ] / ^ A-(«+i )/2 

r(«/2)(«7r)‘/^ V 


—oo < t < oo. 


(9.134) 


This distribution is known as Student’s t-distribution with n degrees of freedom; 
it is named after W.S. Gosset, who used the pseudonym ‘Student’ in his 
research publications. 

Proof of Theorem 9.7: the proof is straightforward following methods given 
in Chapter 5. Sine U and V are independent, their jpdf is 


fuv{u, r) 



^ y(n/2)-l„-v/2 

0.)'/^ ) 

2«/2r(«/2) 


0, elsewhere. 


for —00 < u < 00 , and v > 0, 


(9.135) 


Consider the transformation from U and V to T and V. The method discussed 
in Section 5.3 leads to 


fTv{t,v) =fuv[gi \t,v),g^\t,v)\\J\, (9.136) 

where 

and the Jacobian is 

j _ 0v _ 

8g2‘ 8^2* 

0t 0V 

The substitution of Equations (9.135), (9.137), and (9.138) into Equation 
(9.136) gives the jpdf/ 7 -v(t, v) of T and V. The pdf of T as given by Equation 
(9.134) is obtained by integrating/ 7 -v'(f, v) with respect to v. 

It is seen from Equation (9.134) that the f-distrihution is symmetrical about 
the origin. As n increases, it approaches that of a standardized normal random 
variable. 


= g2‘(6v) = v, 


(9.137) 




, 1 / 2 ' 


1-1 


/V\ 1/2 

0 . (9.138) 
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Returning to random variable Y defined by Equation (9.132), let 


and 


F = 


(n-l)S^ 

/t2 


Then 


r = 



(9.139) 


where U is clearly distributed according to N(0,1). We also see from Section 
9.1.2 that (n — l)S^/a^ has the chi-squared distribution with (n — 1) degrees of 
freedom. Furthermore, although we will not verify it here, it can be shown that 
X and are independent. In accordance with Theorem 9.7, random variable Y 
thus has a t-distribution with (n — 1) degrees of freedom. 

The random variable Y can now be used to establish confidence intervals for 
mean m. We note that the value of Y depends on the unknown mean m, but its 
distribution does not. 

The f-distribution is tabulated in Table A.4 in Appendix A. Let t„ „/2 be the 
value such that 


P{T > tn^a/l) — 

with n representing the number of degrees of freedom (see Figure 9.8). We have 
the result 

P{-tn-Ua/2 <Y< = 1 - (9.140) 

Upon substituting Equation (9.132) into Equation (9.140), a [100(1 — «)]% 
confidence interval for mean m is thus given by 






(9.141) 


Since both X and S are functions of the sample, both the position and the width 
of the confidence interval given above will vary from sample to sample. 
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fAt) 



Figure 9.8 [100(1 — a)]% confidence limits for T with n degrees of freedom 


Example 9.19. Problem; let us assume that the annual snowfall in the Buffalo 
area is normally distributed. Using the snowfall record from 1970-79 as given 
in Problem 8.2(g) (Table 8.6, page 257), determine a 95% confidence interval 
for mean m. 

Answer: for this example, a = 0.05, n = 10, the observed sample mean is 
X = ^(120.5 + 97.0 + • • • + 97.3) = 112.4, 
and the observed sample variance is 

^2 = 1 [(120.5- 112.4)^ + (97.0- 112.4)^+ ••• +(97.3 - 112.4)^] 

= 1414.3. 

Using Table A.4, we find that ? 9 , 0.025 = 2.262. Substituting all the values given 
above into Equation (9.141) gives 

P(85.5 <m< 139.3) = 0.95. 

It is clear that this interval would be different if we had incorporated more 
observations into our calculations or if we had chosen a different set of yearly 
snowfall data. 
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9.3.2.3 Confidence Interval for cr^ in N(m, cr^) 


An unbiased point estimator for population variance cP- is S^. For the con¬ 
struction of confidence intervals for cP, let us use the random variable 


D = 


{n-\)S^ 

rr^ 


(9.142) 


which has been shown in Section 9.1.2 to have a chi-squared distribution with 
(n — 1) degrees of freedom. Letting aji value such that 

P{D > xj, aji) = ct/2 with n degrees of freedom, we can write (see Figure 9.9) 

^(xLi,i-(a/2) <D< xLi.a/ 2 ) = 1 - a, (9.143) 

which gives, upon substituting Equation (9.142) for D, 


(«- 1)‘S'^ < cr2 < 


= 1 — 0 ;. 


(9.144) 


Let us note that the [100(1 — a)]% confidence interval for as defined by 
Equation (9.144) is not the minimum-width interval on the basis of a given 
sample. As we see in Eigure 9.9, a shift to the left, leaving area a/2 — e to the left 
and area a/2 -f e: to the right under thefoid) curve, where e is an appropriate 
amount, will result in a smaller confidence interval. This is because the width 
needed at the left to give an increase of e in the area is less than the correspond¬ 
ing width eliminated at the right. The minimum interval width for a given 


foid) 



Figure 9.9 [100(1 — a)]% confidence limits for D with n degrees of freedom 
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Figure 9.10 One-sided [100(1 — a)]% confidence limit for D with n degrees of freedom 


number of degrees of freedom can be determined by interpolation from tabu¬ 
lated values of the PDF of the chi-squared distribution. 

Table A.5 in Appendix A gives selected values of a various values of n 
and a. For convenience, Equation (9.144) is commonly used for constructing 
two-sided confidence intervals for of a normal population. If a one-sided 
confidence interval is desired, it is then given by (see Figure 9.10) 



(9.145) 


Example 9.20. C onsider Example 9.19 again; let us determine both two-sided 
and one-sided 95% confidence intervals for . 

As seen from Example 9.19, the observed sample variance is 


= 1414.3. 


The values of Xg 0 975 ’ X 9 oo 25 ’ xl 005 obtained from Table A.5 to be as 
follows: 



Equations (9.144) and (9.145) thus lead to, with n = 10 and a = 0.05, 


P(669.12 < cr^ < 4714.33) = 0.95, 
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and 


P{(p- > 752.3) = 0.95. 


9.3.2.4 Confidence Interval for a Proportion 

Consider now the construction of confidence intervals for p in the binomial 
distribution 


Px{k)=p^{\-py ^=0,1. 

In the above, parameter p represents the proportion in a binomial experiment. 
Given a sample of size n from population X, we see from Example 9.10 that an 
unbiased and efficient estimator for is X. For large n, random variable X is 
approximately normal with mean p and variance p{\ — p)jn. 

Defining 


U={X-p) 


'P{^-P) 


, 1 - 1/2 


(9.146) 


random variable U tends to N(0, 1) as n becomes large. In terms of U, we have 
the same situation as in Section 9.3.2.1 and Equation (9.129) gives 

P(-m ,/2 < c/< m,/ 2 ) = 1 - a. (9.147) 


The substitution of Equation (9.146) into Equation (9.147) gives 

W-P)V" 


-Ma/2 <{X -p) 


< Ua/l 


= 1 


(9.148) 


In order to determine confidence limits for p, we need to solve for p satisfying 
the equation 


\X-p\ 


P(l-P) 

n 


-1/2 

< Ua/2, 


or, equivalently 


i^-P? 


^ m^/2P(1 -p) 

~ n 


(9.149) 
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g{p) 



Upon transposing the right-hand side, we have 


P 


2 



-p(2X- 



+ X^ <0. 


(9.150) 


In Equation (9.150), the left-hand side defines a parabola, as shown in Figure 
9.11, and two roots Li and L 2 of Equation (9.150) with the equal sign define the 
interval within which the parabola is negative. Hence, solving the quadratic 
equation defined by Equation (9.150), we have 


Li^2 = 




(9.151) 


For large n, they can be approximated by 


Lia = X ^ Mq/2 


X{i-x) 


1/2 


n 


(9.152) 


An approximate [100(1 — a)]% confidence interval for p is thus given by, for 
large n. 




pix- M„/2 

X{\-X) 

n 


1/2 


< P < X + Ua/2 


X{\-X) 


1/2 


= 1 — Of 


(9.153) 
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In this approximation, sample mean X is at the center of the interval for 
which the width is a function of the sample and the sample size. 

Example 9.21. Problem: in a random sample of 500 persons in the city of Los 
Angeles it was found that 372 did not approve of US energy policy. Determine 
a 95% confidence interval for p, the actual proportion of the Los Angeles 
population registering disapproval. 

Answer: in this example, n = 500, a = 0.05, and the observed sample mean is 
X = 372/500 = 0.74. Table A.3 gives Mo.025 = L96. Substituting these values 
into Equation (9.153) then yields 

P(0.74- 0.04 <p< 0.74 + 0.04) = /'(0.70 <p< 0.78) = 0.95. 
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PROBLEMS 

The following notations and abbreviations are used in some statements of the problems: 

X = sample mean 

X = observed sample mean 

S^ = sample variance 

s^ = observed sample variance 

CRLB = Cramer-Rao lower bound 

ME = moment estimator, or moment estimate 

MLE = maximum likelihood estimator, or maximum likelihood estimate 
pdf = probability density function 
pmf = probability mass function 

9.1 In order to make enrollment projections, a survey is made of numbers of children in 
100 families in a school district; the result is given in Table 9.2. Determine x, the 
observed sample mean, and s^, the observed sample variance, on the basis of these 
100 sample values. 


Table 9.2 D ata for Problem 9.1 


Children (No.) 

Families (No.) 

0 

21 

1 

24 

2 

30 

3 

16 

4 

4 

5 

4 

6 

0 

7 

1 


n = 100 


9.2 Verify that the variance of sample variance S^ as defined by Equation (9.7) is given 
by Equation (9.10). 

9.3 Verify that the mean and variance of kth sample moment as defined by Equation 
(9.14) are given by Equations (9.15). 

9.4 Let Xi,X2 ,..., Yio be a sample of size 10 from the standardized normal distribution 
N(0,1). Determine probability P(X < 1). 

9.5 Let Xi,X 2 ,... ,Xio be a sample of size 10 from a uniformly distributed random 
variable in interval (0, 1). 
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(a) Determine the pdfs of X(i) and X(io). 

(b) Find the probabilities -P[^(i) > 0.5] and P[X(io) < 0.5], 

(c) Determine E{X(\)\ and £{l"(io)}. 

9.6 A sample of size n is taken from a population X with pdf 


fx(x) 


e for X > 0; 
0, elsewhere. 


Determine the probability density function of statistic X. (Hint: use the method of 
characteristic functions discussed in Chapter 4.) 

9.7 Two samples Xi and X 2 are taken from an exponential random variable X with 
unknown parameter 9\ that is. 


fx(x\ 9) 


Ig-x/fl 

9 ’ 


X >0. 


We propose two estimators for 9 in the forms 

e:=T = ^, 


e2 = -(Z|Z2)'/l 

TT 


In terms of unbiasedness and minimum variance, which one is the better of the two? 

9.8 Let Xi and X 2 be a sample of size 2 from a population X with mean m and variance 
a^. 

(a) Two estimators for m are proposed to be 


Ml 


= jr = 


X1+X2 

2 


M 2 


Xi + 2x2 

3 


Which is the better estimator? 

(b) Consider an estimator for m in the form 


M = aX\ + (1 — a)X 2 , 0 < a < 1. 

Determine value a that gives the best estimator in this form. 

9.9 It is known that a certain proportion, say p, of manufactured parts is defective. 
From a supply of parts, n are chosen at random and are tested. Define the readings 
(sample 2Li,2f2,... ,2 l„) to be 1 if good and 0 if defective. Then, a good estimator for 
p, P is 


P= 1 -Z= 1 


1 

n 


{Xi + • • • + X„). 
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(a) Is P unbiased? 

(b) Is P consistent? 

(c) Show that P is an MLE of p. 

9.10 Let X be a random variable with mean m and variance and let Xi,X 2 ,... ,X„he 
independent samples of X. Suppose an estimator for h? is found from the formula 

A2 = * [(X2 - Xtf + (X3 -X 2 f + --- + {X„ - Xn-x)\ 

2 {n- 1) 

Is an unbiased estimator? Verify your answer. 

9.11 The geometrical mean (X 1 X 2 ■ ■ ■ X„)^^" is proposed as an estimator for the unknown 
median of a lognormally distributed random variable X. Is it unbiased? Is it 
unbiased as n ^ 00 ? 

9.12 Let Xi,X 2 ,X 3 be a sample of size three from a uniform distribution for which the pdf is 


fx(x\ 6 ) 


^, for () < X <6] 

U 

0, elsewhere. 


Suppose that flX(i) and bX^^) ars proposed as two possible estimators for 6 . 

(a) Determine a and b such that these estimators are unbiased. 

(b) Which one is the better of the two? In the above, Xq-^ is the jth-order statistic. 

9.13 Let Xi,...,X„ be a sample from a population whose fcth moment a*. = AjX*} 
exists. Show that the feth sample moment 






is a consistent estimator for ak- 

9.14 Let 9 be the parameter to be estimated in each of the distributions given below. Eor 
each case, determine the CRLB for the variance of any unbiased estimator for 9. 

(a) f(x;9) = le-^l'^,x> 0 . 

0 

(b) f{x; 9) = 9x^-\0 <x<\,9>0. 

(c) p(x-,9) = 9^{\-9f-^,x = Q,\. 

9 x ^-0 

(d) p(x; 9) = —;—, x = 0,1,2,... . 

x\ 

9.15 Determine the CRLB for the variances of M and E^, which are, respectively, 
unbiased estimators for m and in the normal distribution N(m, a^). 

9.16 The method of moments is based on equating the kth sample moment Mk to the 
kth population moment a^; that is 


Mk = ak- 


(a) Verify Equations (9.15). 

(b) Show that is a consistent estimator for a*. 
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9.17 Using the maximum likelihood method and the moment method, determine the 
respective estimators 0 of 0 and compare their asymptotic variances for the 
following two cases: 

(a) Case 1: 




202 


, where m is a known constant. 


(b) Case 2: 


1 


^exp 


f(x-,e) = \ x:( 27 r 0 )'^^ 

0, elsewhere. 




, for X > 0; 


9.18 Consider each distribution in Problem 9.14. 

(a) Determine an ME for 6 on the basis of a sample of size n by using the first- 
order moment equation. Determine its asymptotic efficiency (i.e. its efficiency 
as n ^ oo). (Hint: use the second of Equations (9.62) for the asymptotic 
variance of ME.) 

(b) Determine the MLE for 6 . 

9.19 The number of transistor failures in an electronic computer may be considered as a 

random variable. 

(a) Let X be the number of transistor failures per hour. What is an appropriate 
distribution for XI Explain your answer. 

(b) The numbers of transistor failures per hour for 96 hours are recorded in Table 
9.3. Estimate the parameter(s) of the distribution for X based on these data by 
using the method of maximum likelihood. 


Table 9.3 Data for Problem 9.19 


Hourly failures (No.) 

Hours (No.) 

0 

59 

1 

27 

2 

9 

3 

1 

>3 

0 


Total = 96 


(c) A certain computation requires 20 hours of computing time. Use this model 
and find the probability that this computation can be completed without a 
computer breakdown (a breakdown occurs when two or more transistors fail). 

9.20 Electronic components are tested for reliability. Let p be the probability of an 
electronic component being successful and 1 — p be the probability of component 
failure. If X is the number of trials at which the first failure occurs, then it has the 
geometric distribution 

px{k-,p) = {\- p)p'^-\ /c=l,2,.... 
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Suppose that a sampleXi,... ,A'„ is taken from population X, each Xj consisting of 
testing Xj components when the first failure occurs. 

(a) Determine the MLE of p. 

(b) Determine the MLE of P(X > 9), the probability that the component will not 
fail in nine trials. Note: 



9.21 The pdf of a population X is given by 


fx{x-,9) 


for 0 < v < 0; 
0, elsewhere. 


Based on a sample of size n\ 

(a) Determine the MLE and ME for 9. 

(b) Which one of the two is the better estimator? 

9.22 Assume that X has a shifted exponential distribution, with 

fx{x\ a) = x> a. 

On the basis of a sample of size n from X, determine the MLE and ME for a. 

9.23 Let Xi,X 2 ,... ,X„ be a sample of size n from a uniform distribution 

/(v;e) = |l> for9-Ux<9+^-- 
{ 0, elsewhere. 

Show that every statistic h(Xi,... ,X„) satisfying 

3f(«) “ 2 — ’ ■ ■ ■ ’ — ^(1) + 2 

is an MLE for 9, where Ay) is theyth-order statistic. Determine an MLE for 9 when 
the observed sample values are (1.5, 1.4, 2.1, 2.0, 1.9, 2.0, 2.3), with n = l. 

9.24 Using the 214 measurements given in Example 9.11 (see Table 9.1), determine the 
MLE for A in the exponential distribution given by Equation (9.70). 

9.25 Let us assume that random variable X in Problem 8.2(j) is Poisson distributed. 
Using the 58 sample values given (see Figure 8.6), determine the MLE and ME for 
the mean number of blemishes. 

9.26 The time-to-failure T of a certain device has a shifted exponential distribution; 
that is. 


fT(t\ to, A) 


Ae ^9 <o)^ fQ]- r > fg; 
0 , elsewhere. 
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Let Fi, 72, • • •, Fn be a sample from r. 

(a) Determine the MLE and ME for A (Aml and Ame> respectively) assuming to is 
known. 

(b) Determine the MLE and ME for to ftoML and toME. respectively) assuming A 
is known. 

(c) Determine the MLEs and MEs for both A and to assuming both are unknown. 

9.27 If Xi,X 2 , ■ ■ ■ ,Xn is a sample from the gamma distribution; that is, 

f{x; r, A) = X > 0, f, A > 0, 

r(r) 

show that: 

(a) If r is known and A is the parameter to be estimated, both the MLE and ME 
for A are A = rjX. 

(b) If both r and A are to be estimated, then the method of moments and the 
method of maximum likelihood lead to different estimators for r and A. (It is 
not necessary to determine these estimators.) 

9.28 Consider the Buffalo yearly snowfall data, given in Problem 8.2(g) (see Table 8.6) 
and assume that a normal distribution is appropriate. 

(a) Eind estimates for the parameters by means of the moment method and the 
method of maximum likelihood. 

(b) Estimate from the model the probability of having another blizzard of 1977 
[P{X > 199.4)]. 

9.29 Recorded annual flow Y (in cfs) of a river at a given point are 141, 146, 166, 209, 228, 
234, 260, 278, 319, 351, 383, 500, 522, 589, 696, 833, 888, 1173, 1200, 1258, 1340, 
1390, 1420, 1423, 1443, 1561, 1650, 1810, 2004, 2013, 2016, 2080, 2090, 2143, 2185, 
2316, 2582, 3050, 3186, 3222, 3660, 3799, 3824, 4099, and 6634. Assuming that Y 
follows a lognormal distribution, determine the MLEs of the distribution parameters. 

9.30 Let Xi and X 2 be a sample of size 2 from a uniform distribution with pdf 

= forO<x<0; 

[ 0, elsewhere. 

Determine constant c so that the interval 

Q<e< c{Xi + X 2 ) 

is a [100(1 — a)]% confidence interval for 6 . 

9.31 The fuel consumption of a certain type of vehicle is approximately normal, with 
standard deviation 3 miles per gallon. If a sample of 64 vehicles has an average fuel 
consumption of 16 miles per gallon: 

(a) Determine a 95% confidence interval for the mean fuel consumption of all 
vehicles of this type. 

(b) With 95% confidence, what is the possible error if the mean fuel consumption 
is taken to be 16 miles per gallon? 

(c) How large a sample is needed if we wish to be 95% confident that the mean will 
be within 0.5 miles per gallon of the true mean? 
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9.32 A total of 93 yearly Buffalo snowfall measurements are given in Problem 8.2(g) 
(see Table 8.6, page 255). Assume that it is approximately normal with standard 
deviation cr = 26 inches. Determine 95% confidence intervals for the mean using 
measurements of (a) 1909 to 1939, (b) 1909 to 1959, (c) 1909 to 1979, and (d) 1909 
to 1999. Display these intervals graphically. 

9.33 Let 1 and A 2 be independent sample means from two normal populations 

N(mi,af) and respectively. If aj and are known, show that a 

[100(1 — a)]% confidence interval for mi — m 2 is 


<J^ (7^\ (Cp" Cp 

(Xl - X 2 ) - Ua/ 2 ( —+ —) < mi - m2 < (Al - X2) + Ua/ 2 ( —+ — 

V«1 «2/ V'll «2 


1 / 2 - 


= 1 — a. 


where «i and 112 are, respectively, the sample sizes from and N(m 2 ,o' 2 ), 

and Ua /2 is the value of standardized normal random variable U such that 
P(U > Ua/ 2 ) = a/2. 

9.34 Let us assume that random variable in Problem 8.2(e) has a Poisson distribution 
with pmf 


Use the sample values of X given in Problem 8.2(e) (see Table 8.5, page 255) 
and: 

(a) Determine MLE A for A. 

(b) Determine a 95% confidence interval for A using asymptotic properties of 
MLE A. 

9.35 Assume that the lifespan of US males is normally distributed with unknown 
mean m and unknown variance A sample of 30 mortality histories of US males 
shows that 


1 

A = = 71.3 years, 

1 

^ ~ (years)^ 

/=1 

Determine the observed values of 95% confidence intervals for m and 

9.36 The life of light bulbs manufactured in a certain plant can be assumed to be 
normally distributed. A sample of 15 light bulbs gives the observed sample mean 
X = 1100 hours and the observed sample standard deviation i = 50 hours. 

(a) Determine a 95% confidence interval for the average life. 

(b) Determine two-sided and one-sided 95% confidence intervals for its 
variance. 

9.37 A total of 12 of 100 manufactured items examined are found to be defective. 

(a) Find a 99% confidence interval for the proportion of defective items in the 
manufacturing process. 
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(b) With 99% confidence, what is the possible error if the proportion is estimated 
to be 12/100 = 0.12? 

9.38 In a public opinion poll such as the one described in Example 9.21, determine the 
minimum sample size needed for the poll so that with 95% confidence the sample 
means will be within 0.05 of the true proportion. [Hint: use the fact that 
X(1 -X)< 1/4 in Equation (9.153).] 
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Model Verification 


The parameter estimation procedures developed in Chapter 9 presume a dis¬ 
tribution for the population. The validity of the model-building process based 
on this approach thus hinges on the substantiability of the hypothesized dis¬ 
tribution. Indeed, if the hypothesized distribution is off the mark, the resulting 
probabilistic model with parameters estimated by any, however elegant, proced¬ 
ure would, at best, still give a poor representation of the underlying physical or 
natural phenomenon. 

In this chapter, we wish to develop methods of testing or verifying a hypothe¬ 
sized distribution for a population on the basis of a sample taken from the 
population. Some aspects of this problem were addressed in Chapter 8, in 
which, by means of histograms and frequency diagrams, a graphical compar¬ 
ison between the hypothesized distribution and observed data was made. In the 
chemical yield example, for instance, a comparison between the shape of a 
normal distribution and the frequency diagram constructed from the data, as 
shown in Figure 8.1, suggested that the normal model is reasonable in that case. 

However, the graphical procedure described above is clearly subjective and 
nonquantitative. On a more objective and quantitative basis, the problem of 
model verification on the basis of sample information falls within the frame¬ 
work of testing of hypotheses. Some basic concepts in this area of statistical 
inference are now introduced. 


10.1 PRELIMINARIES 

In our development, statistical hypotheses concern functional forms of the 
assumed distributions; these distributions may be specified completely with 
prespecified values for their parameters or they may be specified with para¬ 
meters yet to be estimated from the sample. 

Let Xi,X 2 ,... ,X„ be an independent sample of size n from a population X 
with a hypothesized probability density function (pdf) f(x;&) or probability 
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mass function (pmf) p{x',0), where0 may be specified or unspecified. We denote 
by hypothesis H the hypothesis that the sample represents n values of a random 
variable with pdf/(x;0) or p{x\d). This hypothesis is called a simple hypothesis 
when the underlying distribution is completely specified; that is, the parameter 
values are specified together with the functional form of the pdf or the pmf; 
otherwise, it is a composite hypothesis. To construct a criterion for hypotheses 
testing, it is necessary that an alternative hypothesis be established against 
which hypothesis H can be tested. An example of an alternative hypothesis is 
simply another hypothesized distribution, or, as another example, hypothesis 
H can be tested against the alternative hypothesis that hypothesis H is not true. 
In our applications, the latter choice is considered more practical and we shall 
in general deal with the task of either accepting or rejecting hypothesis H on 
the basis of a sample from the population. 


10.1.1 TYPE-I AND TYPE-II ERRORS 

As in parameter estimation, errors or risks are inherent in deciding whether a 
hypothesis H should be accepted or rejected on the basis of sample information. 
Tests for hypotheses testing are therefore generally compared in terms of the 
probabilities of errors that might be committed. There are basically two types 
of errors that are likely to be made - namely, reject H when in fact H is true or, 
alternatively, accept H when in fact H is false. We formalize the above with 
Definition 10.1. 

Definition 10.1. in testing hypothesis H, a Type-I error is committed when H 
is rejected when in fact H is true; a Type-II error is committed when H is 
accepted when in fact H is false. 

In hypotheses testing, an important consideration in constructing statistical 
tests is thus to control, insofar as possible, the probabilities of making these 
errors. Let us note that, for a given test, an evaluation of Type-I errors can be 
made when hypothesis H is given, that is, when a hypothesized distribution is 
specified. In contrast, the specification of an alternative hypothesis dictates 
Type-II error probabilities. In our problem, the alternative hypothesis is simply 
that hypothesis H is not true. The fact that the class of alternatives is so large 
makes it difficult to use Type-II errors as a criterion. In what follows, methods 
of hypotheses testing are discussed based on Type-I errors only. 


10.2 CHI-SQUARED GOODNESS-OF-FIT TEST 

As mentioned above, the problem to be addressed is one of testing hypothesis H 
that specifies the probability distribution for a population X compared with the 
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alternative that the prohahility distribution of X is not of the stated type on the 
basis of a sample of size n from population X. One of the most popular and most 
versatile tests devised for this purpose is the chi-squared goodness-of-fit 
test introduced by Pearson (1900). 


10.2.1 THE CASE OF KNOWN PARAMETERS 


Let us first assume that the hypothesized distribution is completely specified 
with no unknown parameters. In order to test hypothesis H, some statistic 
h(Xi,X 2 ,... ,Xn) of the sample is required that gives a measure of deviation of 
the observed distribution as constructed from the sample from the hypothe¬ 
sized distribution. 

In the test, the statistic used is related to, roughly speaking, the difference 
between the frequency diagram constructed from the sample and a correspond¬ 
ing diagram constructed from the hypothesized distribution. Let the range 
space of X be divided into k mutually exclusive intervals Ai,A 2 , ..., and A^, 
and let At, be the number ofX^ falling into A,, i = 1,2,... ,k. Then, the observed 
probabilities P(A,) are given by 

N- 

observed P(A,) =—, i=l,2,...,k. (10-1) 

n 

The theoretical probabilities P(A,) can be obtained from the hypothesized 
population distribution. Let us denote these by 

theoretical P(A,) = Pi, i=\,2,...,k. (10-2) 

A logical choice of a statistic giving a measure of deviation is 

( 10 , 3 ) 


which is a natural least-square type deviation measure. Pearson (1900) showed 
that, if we take coefficient a = n/pi, the statistic defined by Expression (10.3) 
has particularly simple properties. Hence, we choose as our deviation measure 


D 


n (Ni \ {Ni — npiY 


= E 



npi 


(10.4) 
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Let us note that D is a statistic since it is a function of Ni, which are, in turn, 
functions of sample Xi,... ,X„. The distribution of statistic D is given in 
Theorem 10.1, attributable to Pearson (1900). 

Theorem 10.1: assuming that hypothesis H is true, the distribution of D 
defined by Equation (10.4) approaches a chi-squared distribution with (k — 1) 
degrees of freedom as oo. Its pdf is given by [see Equation (7.67)] 


fnid) = { 

\^(k-\)iiY(k- 1^ 

‘^(/c-3)/2e-rf/2^ for^>0; 

lo, 

elsewhere. 


Note that this distribution is independent of the hypothesized distribution. 

Proof of Theorem 10.1: The complete proof, which can be found in Crameer 
(1946) and in other advanced texts in statistics, will not be attempted here. To 
demonstrate its plausibility, we only sketch the proof for the k = 2 case. 

For A: = 2, random variable D is 


D = 


{Nx -npxf , {N2 -npjf 


npx np2 

Since iVi + iV 2 = and px+ p 2 = 1, we can write 

{Nx-npxf [n-Nx-n{\-px)f 


D = 


npx np2 

,2f ^ M {Nx-npxf 


= {Nx - npx) -^-= 


npx np 2 ) npx{\-px)' 


( 10 . 6 ) 


Now, recalling that Nx is the number of, say, successes in n trials, with px being 
the probability of success, it is a binomial random variable with = npx 

and varjA^i} = npx{\ — px) if hypothesis H is true. As n increases, we have seen 
in Chapter 7 that N\ approaches a normal distribution by virtue of the central 
limit theorem (Section 7.2.1). Hence, the distribution of random variable U, 
defined by 


^ _ Nx - npx 

approaches N(0,1) as n ^ 00 . Since 

D= C/2, 
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following Equation (10.6), random variable D thus approaches a chi-squared 
distribution with one degree of freedom, and the proof is complete for k — 2. 
The proof for an arbitrary k proceeds in a similar fashion. 

By means of Theorem 10.1, a test of hypothesis H considered above can be 
constructed based on the assignment of a probability of Type-I error. Suppose 
that we wish to achieve a Type-I error probability of a. The test suggests 
that hypothesis H is rejected whenever 



(10.7) 


and is accepted otherwise, where d is the sample value of D based on sample 
values Xi,i = and xl-i a takes the value such that (see Figure 10.1) 


P{D > xl-\,a) = a- 


Since D has a Chi-squared distribution with (k — 1) degrees of freedom for 
large n, an approximate value for xl-i a t'^ti be found from Table A.5 in 
Appendix A for the ^ distribution when a is specified. 

The probability a of a Type-I error is referred to as the significance level in this 
context. As seen from Figure 10.1, it represents the area under/^(t/) to the right 
of a- Letting a = 0.05, for example, the criterion given by Equation (10.7) 
implies that we reject hypothesis H whenever deviation measure d as calculated 
from a given set of sample values falls within the 5% region. In other words, we 
expect to reject H about 5% of the time when in fact H is true. Which significance 
level should be adopted in a given situation will, of course, depend on the 


ioid) 



d 



Figure 10.1 Chi-squared distribution with (k — 1) degrees of freedom 
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particular case involved. In practice, common values for a are 0.001, 0.01, and 
0.05; a value of a between 5% and 1% is regarded as almost significant', a value 
between 1% and 0.1% as significant', and a value below 0.1% as highly significant. 

Let us now give a step-by-step procedure for carrying out the test when 
the distribution of a population X is completely specified. 

• Step 1: divide range space X into k mutually exclusive and numerically 
convenient intervals A,, i = \,2,... ,k. Let n, be the number of sample values 
falling into A,. As a rule, if the number of sample values in any A, is less than 
5, combine interval A, with either A,_i or A,+i. 

• Step 2: compute theoretical probabilities F’(A,) = pi, i = 1, 2,..., A:, by means 
of the hypothesized distribution. 

• Step 3: construct d as given by Equation (10.7). 

• Step 4: choose a value of a and determine from Table A.5 for the x^ 
distribution of (k — 1) degrees of freedom the value of xl-i a- 

• Step 5: reject hypothesis H if d > xl-i a- Otherwise, accept H. 

Example 10.1. Problem: 300 light bulbs are tested for their burning time t (in 
hours), and the result is shown in Table 10.1. Suppose that random burning 
time T is postulated to be exponentially distributed with mean burning time 
1/A = 200 hours; that is, A = 0.005, per hour, and 

fr{t) = 0.005 t>0. (10.8) 

Test this hypothesis by using the x^ test at the 5% significance level. 

Answer: the necessary steps in carrying out the x^ test are indicated in Table 10.2. 
The first column gives intervals A,, which are chosen in this case to be the 
intervals of t given in Table 10.1. The theoretical probabilities P{Ai) = pi in the 
third column are easily calculated by using Equation (10.8). For example, 

^100 

p^=P{Ai)= / 0.005 dt = 1 = 0.39; 

Jo 

t-lOO 

P2 = P{A2) = / 0.005 0^° °°^' dt = 1 - 0-‘ - 0.39 = 0.24. 

7100 


Table 10.1 Sample values for 
Example 10.1 


Burning time, t 

Number 

t< 100 

121 

100 < f < 200 

78 

200 < f < 300 

43 

300 < t 

58 


n = 300 
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Table 10.2 Table for test for Example 10.1 


Interval, A; 

rii 

Pi 

npi 

nf/npi 

t< 100 

121 

0.39 

117 

125.1 

100 < f < 200 

78 

0.24 

72 

84.5 

200 < f < 300 

43 

0.15 

45 

41.1 

300 <t 

58 

0.22 

66 

51.0 


300 

1.00 

300 

301.7 


Note: n,, observed number of occurrences; p,, 
theoretical P(Aj). 


For convenience, the theoretical numbers of occurrences as predicted by the 
model are given in the fourth column of Table 10.2, which, when compared 
with the value in the second column, give a measure of goodness of fit of the 
model to the data. Column 5 (nj/npi) is included in order to facilitate the 
calculation of d. Thus, from Equation (10.7) we have 

k 2 

= = 301.7- 300 = 1.7. 

Now, k= 4. From Table A.5 for the distribution with three degrees of 
freedom, we find 


xlo.o5 = 7.815. 

Since d < oo 5 > accept at the 5% significance level the hypothesis that the 
observed data represent a sample from an exponential distribution with 
A =0.005. 

Example 10.2. Problem: a six-year accident record of 7842 California drivers 
is given in Table 8.2. On the basis of these sample values, test the hypothesis 
that X, the number of accidents in six years per driver, is Poisson-distributed 
with mean rate A= 0.08 per year at the 1% significance level. 

Answer: since X is discrete, a natural choice of intervals A, is those centered 
around the discrete values, as indicated in the first column of Table 10.3. Note 
that interval x > 5 would be combined with 4 < x < 5 if number n-j were less 
than 5. 

The hypothesized distribution for X is 


(At)^e-^' (0.48)^e-°'4s 

— =-o-■ 


x = 0,l,2,.... 


(10.9) 
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Table 10.3 Table for test for Example 10.2 


Interval, At 

m 

Pi 

npi 

njinpi 

X <0 

5147 

0.6188 

4853 

5459 

0 < X < 1 

1859 

0.2970 

2329 

1484 

1 < X < 2 

595 

0.0713 

559 

633 

2 < X < 3 

167 

0.0114 

89 

313 

3 < jc < 4 

54 

0.0013 

10 

292 

4 < X < 5 

14 

0.0001 

1 

196 

5 < X 

6 

0.0001 

1 

36 


7842 

1.0 

7842 

8413 


Note: tii, observed number of occurrences; p,, 
theoretical P(A,). 


We thus have 


P{Ai)=Pi 


(0.48)'^‘ e-0'48 

O'-l)! 


i= 1,2, 


6 , 


P{Aj)=pt = 



These values are indicated in the third column of Table 10.3. 

Column 5 of Table 10.3 gives 

k 2 

d = y"^-n = 8413 - 7842 = 571. 

ttnpi 

With k = 1, the value of „ = Xe ooi t® found from Table A.5 to be 

Xe.o.oi = 16.812. 

Since d > Xe ooi’ hypothesis is rejected at the 1% significance level. 


10.2.2 THE CASE OF ESTIMATED PARAMETERS 

Let us now consider a more common situation in which parameters in the 
hypothesized distribution also need to be estimated from the data. 

A natural procedure for a goodness-of-fit test in this case is first to estimate 
the parameters by using one of the methods developed in Chapter 9 and then to 
follow thex^ test for known parameters, already discussed in Section 7.2.1. In 
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doing so, however, a complication arises in that theoretical probabilities p, 
defined by Equation (10.2) are, being functions of the distribution parameters, 
functions of the sample. The statistic D now takes the form 



( 10 . 10 ) 


where P, is an estimator for p, and is thus a statistic. We see that D is now 
a much more complicated function of Xi,X 2 ,... ,X„. The important question 
to be answered is: what is the new distribution of D? 

The problem of determining the limiting distribution of D in this situation 
was first considered by Fisher (1922, 1924), who showed that, as n ^ oo, the 
distribution of D needs to be modified, and the modification obviously depends 
on the method of parameter estimation used. Fortunately, for a class of 
important methods of estimation, such as the maximum likelihood method, 
the modification required is a simple one, namely, statistic D still approaches a 
chi-squared distribution as n ^ oo but now with (k — r — 1) degrees of free¬ 
dom, where r is the number of parameters in the hypothesized distribution to be 
estimated. In other words, it is only necessary to reduce the number of degrees 
of freedom in the limiting distribution defined by Equation (10.5) by one for 
each parameter estimated from the sample. 

We can now state a step-by-step procedure for the case in which r parameters 
in the distribution are to be estimated from the data. 

• Step 1: divide range space X into k mutually exclusive and numerically con¬ 
venient intervals A,, / = I,... ,k. Let ni be the number of sample values fall¬ 
ing into A,. As a rule, if the number of sample values in any A, is less than 5, 
combine interval A, with either A,_i or A,+i. 

• Step 2: estimate the r parameters by the method of maximum likelihood from 
the data. 

• Step 3: compute theoretical probabilities P(A,) = p,, i = 1,...,k, by means of 
the hypothesized distribution with estimated parameter values. 

• Step 4: construct d as given by Equation (10.7). 

• Step 5: choose a value of a and determine from Table A.5 for the 
distribution of (k — r — 1) degrees of freedom the value of xl-r-i a- It is 
assumed, of course, that k — r 1 > 0. 

• Step 6: reject hypothesis H if d > xl-r-i a- Otherwise, accept H. 

Example 10.3. Problem: vehicle arrivals at a toll gate on the New York State 
Thruway were recorded. The vehicle counts at one-minute intervals were taken 
for 106 minutes and are given in Table 10.4. On the basis of these observations, 
determine whether a Poisson distribution is appropriate for X, the number of 
arrivals per minute, at the 5% significance level. 
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Table 10.4 One-minute arrivals, for Example 10.3 


Vehicles per minute (No.) 

Number of occurrences 

0 

0 

1 

0 

2 

1 

3 

3 

4 

5 

5 

7 

6 

13 

7 

12 

8 

8 

9 

9 

10 

13 

11 

10 

12 

5 

13 

6 

14 

4 

15 

5 

16 

4 

17 

0 

18 

1 


n= 106 


Answer: the hypothesized distribution is 

Pxi.^) = ^ ’ x = 0,l,2,..., (10.11) 

where parameter A needs to be estimated from the data. Thus, r = 1. 

To proceed, we first determine appropriate intervals A, such that n, > 5 for 
all /; these are shown in the first column of Table 10.5. Hence, k— 11. 

The maximum likelihood estimate for A is given by 

1 " 

A = X = X; = 9.09. 

./=i 

The substitution of this value for parameter A in Equation (10.11) permits us to 
calculate probabilities P(A,) = pi. For example, 

4 

Pi = ^Pxii) = ^-^^^^ 

j^O 

P2 = Px{5) = 0.058. 
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Table 10.5 Table for test for Example 10.3 


Interval, A-, 

«/ 

Pi 

npi 

nj/npi 

0 <x < 5 

9 

0.052 

5.51 

14.70 

5 < X < 6 

7 

0.058 

6.15 

7.97 

6 <x<l 

13 

0.088 

9.33 

18.11 

7 <x < 8 

12 

0.115 

12.19 

11.81 

8 <;c < 9 

8 

0.131 

13.89 

4.61 

9<x< 10 

9 

0.132 

13.99 

5.79 

10<x < 11 

13 

0.120 

12.72 

13.29 

11 < ;c < 12 

10 

0.099 

10.49 

9.53 

12 < .r < 13 

5 

0.075 

7.95 

3.14 

13 < .r < 14 

6 

0.054 

5.72 

6.29 

14 < ;c 

14 

0.076 

8.06 

24.32 


106 

1.0 

106 

119.56 


These theoretical probabilities are given in the third column of Table 10.5. 
From column 5 of Table 10.5, we obtain 

d=y^-^-n= 119.56 - 106 = 13.56. 

ttnpi 

Table A.5 with a = 0:05 and A: — r — 1 = 9 degrees of freedom gives 

X 9 ,o .05 = 16.92. 

Since d < X 9 ,o. 05 > the hypothesized distribution with A = 9.09 is accepted at 
the 5% significance level. 

Example 10.4. Problem: based upon the snowfall data given in Problem 8.2(g) 
from 1909 to 1979, test the hypothesis that the Buffalo yearly snowfall can be 
modeled by a normal distribution at 5% significance level. 

Answer: for this problem, the assumed distribution for X, the Buffalo yearly 
snowfall, measured in inches, is N(m,cr^) where m and cr^ must be estimated 
from the data^ Since the maximum likelihood estimator for m and are 
M = X, and Y?- = [{n — 1)/h]S^, respectively, we have 

1 ™ 

m = x = —^x, = 83.6, 

^ 7=1 

= 83.6)^ = 777.4. 

./=i 
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Table 10.6 Table for test for Example 10.4 


Interval, A, 

rii 

Pi 

npi 

njinpi 

V < 56 

13 

0.161 

11.27 

15.00 

56 < X < 12 

10 

0.178 

12.46 

8.03 

12 < X < 88 

20 

0.224 

15.68 

25.51 

88 < V < 104 

13 

0.205 

14.35 

11.78 

104 < jc < 120 

8 

0.136 

9.52 

6.72 

120 < X 

6 

0.096 

6.72 

5.36 


70 

1.0 

70 

72.40 


With intervals A; defined as shown in the first column of Table 10.6, theoretical 
probabilities P(A,) now can be calculated with the aid of Table A.3. For 
example, the first two of these probabilities are 

F(Ai) = P{X <56)=p[u < = F’c/(-0.990) 

= 1 - ^^(0.990) = 1 - 0.8389 = 0.161; 


P{A2) = P{56 <X <12) = P(-0.990 <U < -0.416) 
= [1 - F’[7(0.416)] - [1 - F’c7(0.990)] 

= 0.339 - 0.161 = 0.178. 


The information given above allows us to construct Table 10.6. Hence, 
we have 


[/ = V ^ - « = 72.40 - 70 = 2.40. 

Unpi 

The number of degrees of freedom in this case is k— r— 1= 6— 2— 1=3. 
Table A.5 thus gives 


xio.o 5 = 7.815. 

Since <i<X 3 005 normal distribution N(83.6,777.4) is acceptable at the 5% 
significance level. 

Before leaving this section, let us remark again that statistic D in thex^ test is 
X^-distributed only when n ^ oo . It is thus a large sample test. As a rule, n > 50 
is considered satisfactory for fulfilling the large-sample requirement. 
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10.3 KOLMOGOROV-SMIRNOV TEST 

The so-called KolmogorovSmirnov goodness-of-fit test, referred to as the K-S 
test in the rest of this chapter, is based on a statistic that measures the deviation 
of the observed cumulative histogram from the hypothesized cumulative dis¬ 
tribution function. 

Given a set of sample values xi,X 2 ,... observed from a population X, a 
cumulative histogram can be constructed by (a) arranging the sample values in 
increasing order of magnitude, denoted here by X(i),X( 2 ), ■ ■ ■ ,X{n), (b) determin¬ 
ing the observed distribution function of X at X(i),X( 2 ),..., denoted by 
'^’''[■^( 2 )], ■ ■ •, from relations E®[x(,)] = ijn, and (c) connecting the values 
of E°[x(;)] by straight-line segments. 

The test statistic to be used in this case is 


7)2 = max{|E°[Z(,)]-E;,[X(,)]|} 


1=1 


= max 
1=1 




( 10 . 12 ) 


where X(,) is the ith-order statistic of the sample. Statistic D 2 thus measures the 
maximum of absolute values of the n differences between observed probability 
distribution function (PDF) and hypothesized PDF evaluated for the observed 
samples. In the case where parameters in the hypothesized distribution must be 
estimated, the values for F'x[2f(,)] are obtained by using estimated parameter 
values. 

While the distribution of D 2 is difficult to obtain analytically, its distribution 
function at various values can be computed numerically and tabulated. It can be 
shown that the probability distribution of D 2 is independent of the hypothesized 
distribution and is a function only of n, the sample size (e.g. see Massey, 1951). 

The execution of the K-S test now follows that of the test. At a specified 
a significance level, the operating rule is to reject hypothesis H if d 2 > Cn,a', 
otherwise, accept H. Here, c /2 is the sample value of D 2 , and the value of c„^ qIs 
defined by 


P{D2 > c„,q) = a. (10.13) 

The values of Cn,a for a = 0.01,0.05, and 0.10 are given in Table A.6 in 
Appendix A as functions of n. 

It is instructive to note the important differences between this test and the x^ 
test. Whereas the x^ test is a large-sample test, the K-S test is valid for all values 
of n. Furthermore, the K-S test utilizes sample values in their unaltered and 
unaggregated form, whereas data lumping is necessary in the execution of the 
X^ test. On the negative side, the K-S test is strictly valid only for continuous 
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distributions. We also remark that the values of c„_ „ given in Table A.6 are based 
on a completely specified hypothesized distribution. When the parameter values 
must he estimated, no rigorous method of adjustment is available. In these cases, 
it can be stated only that the values of Cn,a should he somewhat reduced. 

The step-hy-step procedure for executing the K-S test is now outlined as 
follows: 

• Step 1: rearrange sample values xi,X 2 ,... in increasing order of magni¬ 
tude and label them X(i),X( 2 ), ■ ■ ■ ,X[n)- 

• Step 2: determine observed distribution function F^(x) at each X(,) by using 

• Step 3: determine the theoretical distribution function Fx(x) at each X(,) by 
using the hypothesized distribution. Parameters of the distribution are esti¬ 
mated from the data if necessary. 

• Step 4: form the differences |T’®(x(,)) — Fxix(i))\ for i = 

• Step 5: calculate 

d 2 = max{|T’”[x(,-)] - Fx[x^i)]\}. 


The determination of this maximum value requires enumeration of n quan¬ 
tities. This labor can be somewhat reduced by plotting F°(x) and Fx(x) as 
functions of x and noting the location of the maximum by inspection. 

• Step 6: choose a value of a and determine from Table A.6 the value of c„_q. 

• Step 7: reject hypothesis H if d 2 > Cn,a- Otherwise, accept H. 

Example 10.5. Problem: 10 measurements of the tensile strength of one type 
of engineering material are made. In dimensionless forms, they are 30.1, 30.5, 
28.7, 31.6, 32.5, 29.0, 27.4, 29.1, 33.5, and 31.0. On the basis of this data set, test 
the hypothesis that the tensile strength follows a normal distribution at the 5% 
significance level. 

Answer: a reordering of the data yields X(i) = 27.4, X( 2 ) = 28.7,... ,X(io) = 
33-5. The determination of F'°(x(,)) is straightforward. We have, for example, 

F’”(27.4) =0.1, F“(28.7) = 0.2,..., F°(33.5) = 1. 

With regard to the theoretical distribution function, estimates of the mean and 
variance are first obtained from 


1 

m = x = —'^xj = 1,0.1,, 
./=i 
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The values of Fx [■V(,)] can now be found based on distribution N(30.3, 3.14) for 
X. For example, with the aid of Table A.3 for standardized normal random 
variable U, we have 


/27 4 - 30 3\ 

Fx{21A) = Fu(^ j = Fv{-\M) 

= 1 - Fv{\M) = 1 - 0.9495 = 0.0505, 

Fx{2^.1) = = ^t/(-0.90) 

= 1-0.8159 = 0.1841, 


and so on. 

In order to determine d 2 , it is constructive to plot F^{x) and Fx(x) as 
functions of x, as shown in Figure 10.2. It is clearly seen from the figure that 
the maximum of the differences between F'°(x) and Fx(x) occurs at x = X( 4 ) = 
29.1. Hence, 


d2 = |F“(29.1) - Fx(29.1)| = 0.4 - 0.2483 = 0.1517. 
With a = 0.05 and n = 10, Table A.6 gives 

Cm, 0.05 = 0.41. 



Figure 10.2 f°(x) and Fx{x) in Example 10.5. 
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Since c /2 < cio,o. 05 > we accept normal distribution N(30.3, 3.14) at the 5% sig¬ 
nificance level. 

Let us remark that, since the parameter values were also estimated from the 
data, it is more appropriate to compare c /2 with a value somewhat smaller than 
0.41. In view of the fact that the value of c /2 is well below 0.41, we are safe in 
making the conclusion given above. 
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FURTHER READING AND COMMENTS 

We have been rather selective in our choice of topics in this chapter. A number 
of important areas in hypotheses testing are not included, but they can be found 
in more complete texts devoted to statistical inference, such as the following: 

Lehmann, E.L., 1959, Testing Statistical Hypotheses, John Wiley & Sons Inc. New York. 


PROBLEMS 

10.1 In the x^ test, is a hypothesized distribution more likely to be accepted at a = 0.05 
than at Q = 0.01? Explain your answer. 

10.2 To test whether or not a coin is fair, it is tossed 100 times with the following 
outcome: heads 41 times, and tails 59 times. Is it fair on the basis of these tosses at 
the 5% significance level? 

10.3 Based upon telephone numbers listed on a typical page of a telephone directory, 
test the hypothesis that the last digit of the telephone numbers is equally likely to be 
any number from 0 to 9 at the 5% significance level. 

10.4 The daily output of a production line is normally distributed with mean m = 8000 
items and standard deviation a = 1000 items. A second production line is set up. 
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Table 10.7 Production-line data for Problem 10.4 


Daily output interval Number of occurrences 


<4000 

3 

4000-5 000 

3 

5 000-6000 

7 

6000-7 000 

16 

7 000-8 000 

27 

8 000-9000 

22 

9 000-10000 

11 

10000-11000 

8 

11000-12000 

2 

> 12000 

1 


n= 100 


and 100 daily output readings are taken, as shown in Table 10.7. On the basis of 
this sample, does the second production line behave in the same statistical manner 
as the first? Use a = 0.01. 

10.5 In a given plant, a sample of a given number of production items was taken from 
each of the five production lines; the number of defective items was recorded, as 
shown in Table 10.8. Test the hypothesis that the proportion of defects is constant 
from one production line to another. Use a = 0.01. 


Table 10.8 Production-line data for Problem 10.5 


Production line 

Number of defects 

1 

11 

2 

13 

3 

9 

4 

12 

5 

8 


10.6 We have rejected in Example 10.2 the Poisson distribution with A = 0.08 on the 
basis of accident data at the 1% significance level. At the same a: 

(a) Would a Poisson distribution with A estimated from the data be acceptable? 

(b) Would a negative binomial distribution be more appropriate? 

10.7 The data on the number of arrivals of cars at an intersection in 360 10 s intervals 
are as shown in Table 10.9. 

Three models are proposed: 
model 1: 

e^^ 

^ = 0 , 1 ,...; 
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Table 10.9 Arrival of cars at intersection, for Problem 10.7 


Cars per interval 

Number of observations 

0 

139 

1 

128 

2 

55 

3 

25 

4 

13 


n = 360 


model 2: 


^ = 0 , 1 ,..., 

where A is estimated from the data; 
model 3: 


Px{x) = j ^ j/(l -pT, x = 0, 1,..., 

where k and p are estimated from the data. 

(a) Use the test; are these models acceptable at the 5% significance level? 

(b) In you opinion, which is a better model? Explain your answer. 

Note: for model 3, 


mx 


a 


2 

X 


^(1 - P ) 
P 


^(1 - P ) 

n2 


10.8 Car pooling is encouraged in a city. A survey of 321 passenger vehicles coming into 
the city gives the car occupancy profile shown in Table 10.10. Suggest a probabil¬ 
istic model for A, the number of passengers per vehicle, and test your hypothesized 
distribution at a = 0.05 on the basis of this survey. 


Table 10.10 Car occupancy (number of passengers 
per vehicle, excluding the driver), for Problem 10.8 


Occupancy 

Vehicles (No.) 

0 

224 

1 

47 

2 

31 

3 

16 

>4 

3 

« = 321 
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10.9 Problem 8.2(c) gives 100 measurements of time gaps (see Table 8.4). On the basis 
of these data, postulate a likely distribution for X and test your hypothesis at the 
5% significance level. 

10.10 Consider the data given in Problem 8.2(d) for the sum of two consecutive time 
gaps. Postulate a likely distribution for X and test your hypothesis at the 5% 
significance level. 

10.11 Problem 8.2(e) gives data for one-minute vehicle arrivals (see Table 8.5). Postulate 
a likely distribution for X and test your hypothesis at the 5% significance level. 

10.12 Problem 8.2(h) gives a histogram for X, the peak combustion pressure (see Figure 
8.4). On the basis of these data, postulate a likely distribution for X and test your 
hypothesis at the 5% significance level. 

10.13 Suppose that the number of drivers sampled is 200. Based on the histogram given 
in Problem 8.2(i) (see Figure 8.5), postulate a likely distribution for Xi and test 
your hypothesis at the 1% significance level. 

10.14 Problem 8.2(j) gives a histogram for X, the number of blemishes in television tubes 
(see Figure 8.6). On the basis of this sample, postulate a likely distribution for X 
and test your hypothesis at the 5% significance level. 

10.15 A total of 24 readings of the annual sediment load (in 10® tons) in the Colorado 
River at the Grand Canyon are (arranged in increasing order of magnitude) 49, 
50, 50, 66, 70, 75, 84, 85, 98, 118, 122, 135, 143, 146, 157, 172, 177, 190, 225, 235, 
265, 270, 400, 480. Using the Kolmogorov-Smirnov test at the 5% significance 
level, test the hypothesis that the annual sediment load follows a lognormal 
distribution (data are taken from Beard, 1962). 

10.16 For the snowfall data given in Problem 8.2(g) (see Table 8.6), use the Kolmo¬ 
gorov-Smirnov test and test the normal distribution hypothesis on the basis of 
snowfall data from 1909-2002. 


TLFeBOOK 



TLFeBOOK 



11 


Linear Models and Linear 
Regression 


The tools developed in Chapters 9 and 10 for parameter estimation and model 
verification are applied in this chapter to a very useful class of models encoun¬ 
tered in science and engineering. A commonly occurring situation is one in 
which a random quantity, T, is a function of one or more independent (and 
deterministic) variables xi,X 2 , ■ ■ ■, and Xm. For example, wind load (T) acting 
on a structure is a function of height (x); the intensity (Y) of strong motion 
earthquakes is dependent on the distance from the epicenter (x); housing price 
(T) is a function of location (xi) and age (X 2 ); and chemical yield (Y) may be 
related to temperature (xi), pressure (X 2 ), and acid content (X 3 ). 

Given a sample of Y values with their associated values of x,, i = 1,2,..., m, 
we are interested in estimating on the basis of this sample the relationship 
between Y and the independent variables xi,X 2 ,. • •, and Xm- In what follows, 
we concentrate on some simple cases of the broadly defined problem stated 
above. 


11.1 SIMPLE LINEAR REGRESSION 

We assume in this section that random variable T is a function of only one 
independent variable and that their relationship is linear. By a linear relation¬ 
ship we mean that the mean of T, E{T}, is known to be a linear function of x, 
that is, 


E{Y} = a + (3x. 


( 11 . 1 ) 


The two constants, intercept a and slope (3, are unknown and are to be 
estimated from a sample of Y values with their associated values of x. Note 
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that £{F} is a function x. In any single experiment, x will assume a certain 
value X, and the mean of Y will take the value 


E{Yi} = a + Pxi. ( 11 . 2 ) 

Random variable Y is, of course, itself a function of x. If we define a random 
variable E by 


E= Y-{a + Px), 


(11.3) 


we can write 


Y — a -\- l3x E ^ 


(11.4) 


where E has mean 0 and variance cP', which is identical to the variance of Y. The 
value of is not known in general but it is assumed to be a constant and not 
a function of x. 

Equation (11.4) is a standard expression of a simple linear regression model. 
The unknown parameters a and /? are called regression coefficients, and random 
variable E represents the deviation of Y about its mean. As with simple models 
discussed in Chapters 9 and 10, simple linear regression analysis is concerned 
with estimation of the regression parameters, the quality of these estimators, 
and model verification on the basis of the sample. We note that, instead of 
a simple sample such as Y i,Y 2 , ■ ■ ■ ,Y „ as in previous cases, our sample in the 
present context takes the form of pairs (xi, Y i), (x 2 , Y 2 ), ■ ■ ■, (x„, F„). For each 
value X, assigned to x, T, is an independent observation from population Y 
defined by Equation (11.4). Hence, (x,, T,), / = 1,2, ...,n, may be considered 
as a sample from random variable Y for given values xi,X 2 , ■ ■ ■, and x„ of x; 
these X values need not all be distinct but, in order to estimate both a and /3, 
we will see that we must have at least two distinct values of x represented in 
the sample. 


11.1.1 LEAST SQUARES METHOD OF ESTIMATION 

As one approach to point estimation of regression parameters a and /?, the 
method of least squares suggests that their estimates, a and /3, be chosen so 
that the sum of the squared differences between observed sample values 
yi and the estimated expected value of Y ,d + /3x,, is minimized. Let us 
write 


ei = Yi - (d + Pxi). 


(11.5) 
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y 



The least-square estimates a and (3, respectively, of a and /? are found by 
minimizing 

= '^\yi- {a + Pxi)f. ( 11 . 6 ) 

/=1 ;=1 

In the above, the sample-value pairs are ixi,yi),(x 2 ,y 2 ), ■ ■ ■ Axn^yn), and 
e,,/ = 1,2,... ,n, are called the residuals. Figure 11.1 gives a graphical presen¬ 
tation of this procedure. We see that the residuals are the vertical distances 
between the observed values of T,y,, and the least-square estimate a-f/3x of 
true regression line a + (3x. 

The estimates a and (3 are easily found based on the least-square procedure. 
The results are stated below as Theorem 11.1. 

Theorem 11.1: consider the simple linear regression model defined by 
Equation (11.4). Let (xi,yi), (x 2 ,y 2 ). ■ ■ •, (xn,yn) be observed sample values of Y 
with associated values of x. Then the least-square estimates of a and (3 are 


a = y — /3x, 



II 

n 

'Y^{xi-x){yi-y) 

^(x; - xf 

1 
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where 


1 " 


and 


J = 


1 

n 




Proof of Theorem 11.1: estimates d and (3 are found by taking partial 
derivatives of Q given by Equation (11.6) with respect to d and f3, setting these 
derivatives to zero and solving for d and (3. Hence, we have 


^ = +(3xi)], 

i=\ 

^ ^ - (a + i3xi)]. 

,= 1 


Upon simplifying and setting the above equations to zero, we have the so-called 
normal equations: 


nd + nxf3 = ny, (11-9) 

n n 

nxd + (3 x] = Xjyj. (11.10) 

/—I 1—1 

Their solutions are easily found to be those given by Equations (11.7) and 

( 11 . 8 ). 

To ensure that these solutions correspond to the minimum of the sum of 
squared residuals, we need to verify that 


and 


iQ 

0q;^ 


0 , 


8^6 

02g 

0q;^ 

0d 0/3 

0"e 

02g 

0d 0/3 

0/32 


TLFeBOOK 



Linear Models and Linear Regression 


339 


at d and [3. Elementary calculations show that 


dd^ 


— 2n > 0, 


and 


n 

D = — x)^ > 0 

1=1 


The proof of this theorem is thus complete. Note that D would be zero if all 
X, take the same value. Hence, at least two distinct x, values are needed for the 
determination of d and /3. 

It is instructive at this point to restate the foregoing results by using a more 
compact vector-matrix notation. As we will see, results in vector-matrix form 
facilitate calculations. Also, they permit easy generalizations when we consider 
more general regression models. 

In terms of observed sample values {x\,y\),{x 2 ,y 2 ),---,{xn,yn), we have a 
system of observed regression equations 

yt = a +j3xi +ei, i=\,2,...,n. (H-H) 

Let 



'1 

Xi 


>i' 


e\ 

c = 

1 

■^2 

, y = 

T2 

, e = 

ei 


1 

X„_ 


_yn_ 




and let 


0 = 



Equations (11.11) can be represented by the vector-matrix equation 


y=Ce+e. 


( 11 . 12 ) 


The sum of squared residuals given by Equation (11.6) is now 

Q = e^e={y-C0)'^{y-C0). (11.13) 
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The least-square estimate of 6, 6, is found by minimizing Q. Applying the 
variational principle discussed in Section 9.3.1.1, we have 

§Q = c6fcm 

= -2?>d^C^{y-C6). 

Setting 8Q — 0, the solution for 6 is obtained from normal equation 

C'^(y-C6)=0, (11.14) 

or 


which gives 


C^C6 = C^y, 

6= (C'^Cy^C'^y. 


(11.15) 


In the above, the inverse of matrix C^C exists if there are at least two distinct 
values of x, represented in the sample. 

We can easily check that Equation (11.15) is identical to Equations (11.7) 
and (11.8) by noting that 


C‘C = 


cV = 


1 1 

Xi X2 


1 1 

Xi X2 


1 

X„ 



'1 

Xi" 



- 

1 

21:2 


n 





rix 


.1 

Xn. 




Fi' 





F 2 

= 

ny 

n 

E xtyt 
_ ;■= 1 


.F«- 





i—1 


and 



n nx 

-1 

ny 

6={C^C)~^C^y = 

n 

nx ^ xj 
i=l 


n 

E xiyt 
_ 1= 1 


F - ) E x/yi - nxy )\Y,x] -nx^ 
i=\ J (i=l 


X; x,>>,' - nxy ><X^xf -nx^ 
i=l J 11=1 


y-i3x 

X(x,' - x)(>>; - F) ) I E(x,' - x) 
1=1 J Li=i 


-1 -1 


-1 
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Table 11.1 Percentage yield, y,-, with process temperature, Xi, for Example 11.1 








i 






1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

V (°C) 

45 

50 

55 

60 

65 

70 

75 

80 

85 

90 

y 

43 

45 

48 

51 

55 

57 

59 

63 

66 

68 


Example 11.1. Problem: it is expected that the average percentage yield, Y, 
from a chemical process is linearly related to the process temperature, x, in °C. 
Determine the least-square regression line for £{7} on the basis of 10 observa¬ 
tions given in Table 11.1. 

Answer: in view of Equations (11.7) and (11.8), we need the following 
quantities: 


1 " 1 

X = = Jq (45 -I- 50 -I- • • • -I- 90) = 67.5, 

/=1 

3^ = = 4 (43 + 45 + • • • + 68) = 55.5, 

n 10 

1=1 

n 

J2{xi - xf = 2062.5, 

i={ 

n 

y^,{xi - x){yi-y) = 1182.5. 

(=1 


The substitution of these values into Equations (11.7) and (11.8) gives 


1182.5 




2062.5 


0.57, 


d= 55.5 - 0.57(67.5) = 17.03. 


The estimated regression line together with observed sample values is shown 
in Figure 11.2. 

It is noteworthy that regression relationships are valid only for the range 
of X values represented hy the data. Thus, the estimated regression line in 
Example 11.1 holds only for temperatures between 45 ° C and 90 ° C. Extrapolation 
of the result beyond this range can be misleading and is not valid in general. 

Another word of caution has to do with the basic linear assumption between 
£{7} and x. Linear regression analysis such as the one performed in Example 
11.1 is based on the assumption that the true relationship between £{7} and 
X is linear. Indeed, if the underlying relationship is nonlinear or nonexistent, 
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y 



Figure 11.2 Estimated regression line and observed data for Example 11.1 

linear regression produces meaningless results even if a straight line appears 
to provide a good fit to the data. 


11.1.2 PROPERTIES OF LEAST-SQUARE ESTIMATORS 


The properties of the estimators for regression coefficients a and j3 can be 
determined in a straightforward fashion following the vector-matrix expression 
Equation (11.15). Let A and .6 denote, respectively, the estimators for a and (3 
following the method of least squares, and let 



We see from Equation (11.15) that 


Q= {C'^Cy^C'^Y, 


where 


Yi 


Y = 


Y„ 


(11.16) 


(11.17) 


(11.18) 


and YjJ = 1,2,... ,n, are independent and identically distributed according to 
Equation (11.4). Thus, if we write 

Y=ce + E, (11.19) 
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then £■ is a zero-mean random vector with covariance matrix A = cP'I, / being 
the n X n identity matrix. 

The mean and variance of estimator 0 are now easily determined. In view of 
Equations (11.17) and (11.19), we have 

£■{ 0 } = {C'^ cy^ E{Y} 

= {c'^cy^c^[ce +E{E}] (11.20) 

= {c^cy\c^c)d = d. 

Hence, estimators A and B for a and /?, respectively, are unbiased. 

The covariance matrix associated with 0 is given by, as seen from Equation 
(11.17), 


cov{0} = £{(0 - 6>)(0 - e)'^} 

= {C^Cy^C^coy{Y)C{C^cy\ 

But cov{F } = (7^/; we thus have 

coN{&) = y{c^cy^c^c{c^cy^=y{c^cy\ (11.21) 

The diagonal elements of the matrix in Equation (11.21) give the variances of 
A and B. In terms of the elements of C, we can write 


var{^} = 


varjA} = 


i=l 




1 

1 -1 


^(X; -X)" 


Z=1 


( 11 . 22 ) 

(11.23) 


It is seen that these variances decrease as sample size n increases, according to 1/n. 
Thus, it follows from our discussion in Chapter 9 that these estimators are consistent - 
a desirable property. We further note that, for a fixed n, the variance of B can be 
reduced by selecting thex, in such a way that the denominator of Equation (11.23) is 
maximized; this can be accomplished by spreading thex, as far apart as possible. In 
Example 11.1, for example, assuming that we are free to choose the values of x,, the 
quality of /3 is improved if one-half of the x readings are taken at one extreme of the 
temperature range and the other half at the other extreme. However, the sampling 
strategy for minimizing var(A) for a fixed n is to make x as close to zero as possible. 

Are the variances given by Equations (11.22) and (11.23) minimum variances 
associated with any unbiased estimators for a and /3? An answer to this import¬ 
ant question can be found by comparing the results given by Equations (11.22) 
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and (11.23) with the Cramer-Rao lower bounds defined in Section 9.2.2. In 
order to evaluate these lower bounds, a probability distribution of Y must be 
made available. Without this knowledge, however, we can still show, in Theorem 
11.2, that the least squares technique leads to linear unbiased minimum-variance 
estimators for a and (3-, that is, among all unbiased estimators which are linear 
in Y, least-square estimators have minimum variance. 

Theorem 11.2: let random variable Y be defined by Equation (11.4). Given 
a sample (xi, Y i), (x 2 , Y 2 ), ■ ■ ■, (x„, Y„) of Y with its associated x values, least- 
square estimators A and B given by Equation (11.17) are minimum variance 
linear unbiased estimatorsfor a and /?, respectively. 

Proof of Theorem 11.2: the proof of this important theorem is sketched 
below with use of vector-matrix notation. 

Consider a linear unbiased estimator of the form 

0* = [(C^C)^‘C'^-FG]F. (11.24) 

We thus wish to prove that G = 0 if 0* is to be minimum variance. 

The unbiasedness requirement leads to, in view of Equation (11.19), 

GC=0. (11.25) 

Consider now the covariance matrix 

cov{0*} = £{(0* - 0)(0* - 0)^}. (11.26) 

Upon using Equations (11.19), (11.24), and (11.25) and expanding the covari¬ 
ance, we have 

cov{0*} = cr2[(c'^C)^‘ -F GG'^]. 

Now, in order to minimize the variances associated with the components of 0*, 
we must minimize each diagonal element of GG^. Since the iith diagonal 
element of GG^ is given by 

(GGT), = ^4., 

./=i 

where gy is the ijth element of G, we must have 

gij = 0, for all / and/. 


and we obtain 


G = 0. 


(11.27) 


TLFeBOOK 



Linear Models and Linear Regression 


345 


This completes the proof. The theorem stated above is a special case of the 
Gauss-Markov theorem. 

Another interesting comparison is that between the least-square estimators 
for a and (3 and their maximum likelihood estimators with an assigned dis¬ 
tribution for random variable Y. It is left as an exercise to show that the 
maximum likelihood estimators for a and (3 are identical to their least-square 
counterparts under the added assumption that Y is normally distributed. 


11.1.3 UNBIASED ESTIMATOR FOR 


As we have shown, the method of least squares does not lead to an estimator 
for variance of Y, which is in general also an unknown quantity in linear 
regression models. In order to propose an estimator for tr^, an intuitive choice is 


T? = k^[Yi-{A +Bxi) 


(11.28) 


1=1 


where coefficient A: js^to be chosen so that YA is unbiased. In order to carry out 
the expectation of we note that [see Equation (11.7)] 


Yi- A- Bxi =Yi-{Y - Bx)- Bxi 
= {Yi-Y)-B{xi-x). 

Hence, it follows that 

y. _ i _ Bx,f = ^(T, - T)2 - B^ - x)2, 

/—I Z—1 Z—1 

since [see Equation (11.8)] 

'^{xi-x){Yi-Y) = B'^{xi-xf. 

Z—1 Z—1 

Upon taking expectations term by term, we can show that 

E{Y^} = kEl F,- - F)2 - B^ ^(x,. - x)21 

t Z-1 Z-1 J 

= k{n — l)a^. 


(11.29) 


(11.30) 


(11.31) 
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Hence, is unbiased with k = l/(n — 2), giving 


Z=1 


or, in view of Equation (11.30), 



(11.32) 


(11.33) 


Example 11.2. Problem: use the results given in Example 11.1 and determine 
an unbiased estimate for a^. 

Answer: we have found in Example 11.1 that 

n 

^(xj — x)^ = 2062.5, 

/=! 

/3 = 0.57. 

In addition, we easily obtain 

/-I 

Equation (11.33) thus gives 

= l [680.5 - (0.57)^(2062.5)] 

O 

= 1.30. 

Example 11.3. Problem: an experiment on lung tissue elasticity as a function 
of lung expansion properties is performed, and the measurements given in 
Table 11.2 are those of the tissue’s Young’s modulus (Y), in gcm^^, at varying 
values of lung expansion in terms of stress (x), in gcm^^. Assuming that E{Y } 
is linearly related to x and that ay = cr^ (a constant), determine the least-square 
estimates of the regression coefficients and an unbiased estimate of a^. 

Table 11.2 Young’s modulus, y (gcm^^), with stress, x (gcm^^), for Example 11.3 

X 2 2.5 3 5 7 9 10 12 15 16 17 18 19 20 

y 9.1 19.2 18.0 31.3 40.9 32.0 54.3 49.1 73.0 91.0 79.0 68.0 110.5 130.8 
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Answer: in this case, we have n = 14. The quantities of interest are 



- xf 

/=1 


'^{yi - yf 

/-I 

n 

Y^{xi-x){yi-y) 


:|^(2 + 2.5 + --- + 20 ) = 11 . 11 , 
^{9A + 19.2 + • • • + 130.8) = 57.59, 
546.09, 

17,179.54, 

2862.12. 


The substitution of these values into Equations (11.7), (11.8), and (11.33) gives 


^3 = 2862:12 ^524 
^ 546.09 ■ ’ 

d= 57.59 - 5.24(11.11) = -0.63, 

= ^ [17,179.54 - (5.24)^(546.09)] = 182.10. 


The estimated regression line together with the data are shown in Figure 11.3. 
The estimated standard deviation is a = V182.10 = 13.49g cm^^, and the 
Icr-hand is also shown in the figure. 


11.1.4 CONFIDENCE INTERVALS FOR REGRESSION 
COEFFICIENTS 

In addition to point estimators for the slope and intercept in linear regression, it 
is also easy to construct confidence intervals for them and for a + (5x, the mean 
of Y, under certain distributional assumptions. In what follows, let us assume 
that Y is normally distributed according to N(a + /Jr, cP'). Since estimators A, 
B, and A + Bx are linear functions of the sample of Y, they are also normal 
random variables. Let us note that, when sample size n is large. A, B, and 
A + Bx are expected to follow normal distributions as a consequence of the 
central limit theorem (Section 7.2.1), no matter how Y is distributed. 

We follow our development in Section 9.3.2 in establishing the desired 
confidence limits. Based on our experience in Section 9.3.2, the following are 
not difficult to verify: 
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Figure 11.3 Estimated regression line and observed data, for Example 11.3 

• Result i: let Y?- be the unbiased estimator for as defined by Equation 
(11.33), and let 


D = 


[n - 2)E2 


(11.34) 


It follows from the results given in Section 9.3.2.3 that D is a x^.^jj^tj-ibuted 
random variable with (« — 2) degrees of freedom. 

• Result ii; consider random variables 


(A-a) 



nj^(xi-xf 

/=1 



(11.35) 
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{B-P) 


S2 


- xf 

i=l 


-1/2 


349 


(11.36) 


where, as seen from Equations (11.20), (11.22), and (11.23), a and (3 are, 
respectively, the means of A and B and the denominator^re, respectively, 
the standard deviations of A and B with tr^ estimated by Y?. The derivation 
given in Section 9.3.2.2 shows that each of these random variables has a 
t-distribution with (n — 2) degrees of freedom. 

• Result iii: estimator E{ Y} for the mean of Y is normally distributed with 
mean a + (3x and variance 


var{£{?}} 


var{A + Bx} 

var{A} + x^var{.S} + 2xcov{A, B} 






/=1 


E 

n •“ 

V i=\ 


— Ixx 


-+ (x - 
n 


^(X; - xf 


i'=l 


(11.37) 


Hence, again following the derivation given in Section 9.3.2.2, random variable 


E{Y}-{a + (3x)\{Y^^^+{xi-xf 




-1/2 


(11.38) 


is also t-distributed with (n — 2) degrees of freedom. 

Based on the results presented above, we can now easily establish confidence 
limits for all the parameters of interest. The results given below are a direct 
consequence of the development in Section 9.3.2. 

• Result 1: a [100(1 — 7 )]% confidence interval for a is determined by [see 
Equation (9.141)] 



n 

i \ 1/2 

n^{xi-xf 

1 


(11.39) 


TLFeBOOK 




350 


Fundamentals of Probability and Statistics for Engineers 


• Result 2: a [100(1 — 7 )% confidence interval for fD is determined by [see 
Equation (9.141)] 


( 

n 

l \ 1/2 

Ei,2 = .BT4-2,7/2< 

- xf 


1 


) 


(11.40) 


• Result 3: a [100(1 — 7 )]% confidence interval for £{¥} = a + fix is deter¬ 
mined by [see Equation (9.141)] 


Li,2 = i?{7}T4-2,7/2<! E2<(2+(^_^)^ 




Z=1 


-1 ■ 


1/2 


(11.41) 


• Result 4: a two-sided [100(1 — 7 )% confidence interval for cp- is determined 
by [see Equation (9.144)] 



(11.42) 


If a one-sided confidence interval for cP is desired, it is given by [see Equation 
(9.145)] 


Li = 


(n - 2)E2 


(11.43) 


A number of observations can be made regarding these confidence intervals. In 
each case, both the position and the width of the interval will vary from sample 
to sample. In addition, the confidence interval for a + fix is shown to be a 
function of x. If one plots the observed values of Li and L 2 they form a 
confidence band about the estimated regression line, as shown in Eigure 11.4. 
Equation (11.41) clearly shows that the narrowest point of the band occurs at 
X =x;it becomes broader as x moves away from x in either direction. 

Example 11.4. Problem: in Example 11.3, assuming that Y is normally 
distributed, determine a 95% confidence band for a + fix. 
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y 



Answer: equation (11.41) gives the desired confidence limits, with n = 14, 
7 = 0.05, and 

E{y} = 6 l + j3x= —0.63 + 5.24x, 

6 , -2,7/2 = 02,0.025 = 2.179, from Table A.4, 
x= 11.11, 

n 

Y^{xi - xf = 546.09, 

;=1 

= 182.10. 

The observed confidence limits are thus given by 
/i 2 = (-0.63 + 5.24x) T 2.179| 182.10 ^ 

This result is shown graphically in Figure 11.5. 


(x- 11.11)" 


546.09 


1/2 


11.1.5 SIGNIFICANCE TESTS 

Following the results given above, tests of hypotheses about the values of a and 
(3 can be carried out based upon the approach discussed in Chapter 10. Let us 
demonstrate the underlying ideas by testing hypothesis Hq\(3 = (3q against 
hypothesis (3q, where /3o is some specified value. 
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Stress, X (g/cm^) 

Figure 11.5 The 95% confidence band for £{7}, for Example 11.4 


Figure 11.6 


m 



xf 


-1/2 


Using as the test statistic, we have shown in Section 11.1.4 that the random 
variable defined by Equation (11.36) has a f-distribution with n — 2 degrees of 
freedom. Suppose we wish to achieve a Type-I error probability of 7 . We would 
reject Hq if \(3 — /3o| exceeds (see Figure 11.6) 
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^ 77 — 2 , 7/2 


- xf 

i'=l 


1/2 


(11.44) 


Similarly, significance tests about the value of a can be easily carried out with 
use of A as the test statistic. 

An important special case of the above is the test of //q: /3 = 0 against 
Hi'.fi ^ 0. This particular situation corresponds essentially to the significance 
test of linear regression. Accepting Hq is equivalent to concluding that there is 
no reason to accept a linear relationship between E{Y} and x at a specified 
significance level 7 . In many cases, this may indicate the lack of a causal 
relationship between E{Y} and independent variable x. 

Example 11.5. Problem: it is speculated that the starting salary of a clerk is a 
function of the clerk’s height. Assume that salary (F) is normally distributed and 
its mean is linearly related to height (x); use the data given in Table 11.3 to test 
the assumption that E{Y} and x are linearly related at the 5% significance level. 

Table 11.3 Salary, y (in $10000), with height, x (in feet), 
for Example 11.5 


X 5.7 5.7 5.7 5.7 6.1 6.1 6.1 6.1 

y 2.25 2.10 1.90 1.95 2.40 1.95 2.10 2.25 


Answer: in this case, we wish to test //q: /3 = 0 against Hi: (3 with 7 = 0:05. 
From the data in Table 11.3, we have 


/3= - x){yi - y) ^(x,'-x)^ 

_i=l J [/=1 

= 0.31, 

hi- 2 , 7/2 = ^ 6 . 0.025 = 2.447, from Table A.4, 

1 


n — 2 

n 

y^(x,- — x)^ = 0.32. 

1=1 

According to Equation (11.44), we have 


- yf - J2^Xi - xf 


= 0 . 02 , 


^ 6 , 0.025 \ 


^(X; - xf 


i'=l 


1/2 


= 0.61. 
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Since j3 = 0.31 < 0.61, we accept Hq. That is, we conclude that the data do not 
indicate a linear relationship between £{T} and x; the probability that we are 
wrong in accepting No is 0.05. 

In closing, let us remark that we are often called on to perform tests of 
simultaneous hypotheses. For example, one may wish to test Ho’.a = 0 and 
(3=1 against Hi: a ^ 0 or (3 I or both. Such tests involve both estimators 
A and 5 and hence require their joint distribution. This is also often the case in 
multiple linear regression, to be discussed in the next section. Such tests 
customarily involve F'-distributed test statistics, and we will not pursue them 
here. A general treatment of simultaneous hypotheses testing can be found in 
Rao (1965), for example. 


11.2 MULTIPLE LINEAR REGRESSION 


The vector-matrix approach proposed in the preceding section provides a smooth 
transition from simple linear regression to linear regression involving more than 
one independent variable. In multiple linear regression, the model takes the form 


E{ T} — /3o + (3\X\ + (32X2 + • • • + (3mXm- 


(11.45) 


Again, we assume that the variance of T is cr^ and is independent of x i, X 2 , ■ • ■, and 
Xm- As in simple linear regression, we are interested in estimating (m + 1) regres¬ 
sion coefficients (3o, (3\,..., and (3m, obtaining certain interval estimates, and testing 
hypotheses about these parameters on the basis of a sample of Y values with their 
associated values of (xi,X 2 , ■ ■ ■ ,Xm)- Let us note that our sample of size n in this 
case takes the form of arrays (xii,X 2 i, ■ ■ ■ ,x„,i, Ti), {x\ 2 ,X 22 , ■ ■ ■ ,Xm 2 ,yi), ■ ■ ■, 
ix\n,X2n, ■ ■ ■ ,Xmn, Yn)- For each Set of values Xki,k = 1,2,..., m, of x;, T,- is an 
independent observation from population Y defined by 


Y — (3o-\- (3\X{ + • • • + (3mXm + E. 


As before, E is the random error, with mean 0 and variance 


(11.46) 


11.2.1 LEAST SQUARES METHOD OF ESTIMATION 

To estimate the regression coefficients, the method of least squares will again be 
employed. Given observed sample-value sets (xi,,X2i, ■ ■. ,Xmi,yi), i =1,2,... ,n, 
the system of observed regression equations in this case takes the form 

yi = (3o +(3\xu3 -h/?„x„„--F e,', i=l,2,...,n. (11-47) 
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If we let 


-1 

to 



>i' 


ei 

1 Xi2 X22 • 

Xm2 

, y = 

T2 

, e = 

ei 

1 X{fj X2n 

Xjnn 


_yn_ 


€i2 


and 

Pa 

e- , 

_ Pm _ 

Equation (11.47) can be represented by vector-matrix equation: 

y=Cd + e. (11.48) 

Comparing Equation (11.48) with Equation (11.12) in simple linear regression, 
we see that the observed regression equations in both cases are identical except 
that the C matrix is now an n x (m + 1) matrix and 6 is an (m + l)-dimensional 
vector. Keeping this dimension difference in mind, the results obtained in the 
case of simple linear regression based on Equation (11.12) again hold in the 
multiple linear regression case. Thus, without further derivation, we have for 
the solution of least-square estimates 6 of 6 [see Equation (11.15)] 


d={C^CY^C^y. 


(11.49) 


The existence of matrix inverse (C^C)^* requires that there are at least (m -f 1) 
distinct sets of values of {xu,X 2 i, ■ ■ ■ ,Xmi) represented in the sample. It is noted 
that C^C is a (m + 1) X (m + 1) symmetric matrix. 


Example 11.6. Problem: the average monthly electric power consumption (Y) 
at a certain manufacturing plant is considered to be linearly dependent on the 
average ambient temperature {x\) and the number of working days in a month 
(x 2 ). Consider the one-year monthly data given in Table 11.4. Determine the 
least-square estimates of the associated linear regression coefficients. 


Table 11.4 Average monthly power consumption y (in thousands of kwh), with 
number of working days in the month, X 2 , and average ambient temperature, Xj, (in °F) 

for Example 11.6 


20 

26 

41 

55 

60 

67 

75 

79 

70 

55 

45 

33 

23 

21 

24 

25 

24 

26 

25 

25 

24 

25 

25 

23 

210 

206 

260 

244 

271 

285 

270 

265 

234 

241 

258 

230 
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Answer: in this case, C is a 12 x 3 matrix,and 

12 626 290■ 

626 36,776 15,336 , 

290 15,336 7,028_ 

2,974‘ 

159,011 . 

72,166 _ 

We thus have, upon finding the inverse of C^C by using either matrix inversion 
formulae or readily available matrix inversion computer programs, 

r-33.84‘ 

0= (C’^C)^‘CV= 0.39 , 

10.80 


/3o = -33.84, /3i = 0.39, $3 = 10.80. 

The estimated regression equation based on the data is thus 

E{y} = /3o + /?iXi + $ 2 X 2 

= -33.84 + 0.39x1 + 10.80x2. 

Since Equation (11.48) is identical to its counterpart in the case of simple linear 
regression, much of the results obtained therein concerning properties of least- 
square estimators, confidence intervals, and hypotheses testing can be dupli¬ 
cated here with, of course, due regard to the new definitions for matrix C and 
vector 0. 

Let us write estimator 0 for 9 in the form 

0= (C‘^C)^‘C'^F. (11.50) 

We see immediately that 

E{e} = {C^CY^C^E{Y}^d. (11.51) 

Hence, least-square estimator 0 is again unbiased. It also follows from Equa¬ 
tion (11.21) that the covariance matrix for 0 is given by 

cov{0} = ct2(C’^C)^‘. (11.52) 
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Confidence intervals for the regression parameters in this case can also be 
established following similar procedures employed in the case of simple linear 
regression. Concerning hypotheses testing, it was mentioned in Section 11.1.5 
that testing of simultaneous hypotheses is more appropriate in multiple linear 
regression, and that we will not pursue it here. 


11.3 OTHER REGRESSION MODELS 

In science and engineering, one often finds it necessary to consider regression 
models that are nonlinear in the independent variables. Common examples of 
this class of models include 


Y = (3(^ + I3xx + (32X^ + E, (11.53) 

7 =/7oexp(/7ix +E), (11.54) 

Y = /7o + /?i-^i + 132X2 + I3nx\ + 1322x1 + (3x2X\X2 + E, (11.55) 

Y = (3xxf3\ + E. (11.56) 


Polynomial models such as Equation (11.53) or Equation (11.55) are still 
linear regression models in that they are linear in the unknown parameters 
Po, f3i, (32, ■ ■ ■, [etc. Hence, they can be estimated by using multiple linear 
regression techniques. Indeed, let xi = x, and X 2 = x^ in Equation (11.53), it 
takes the form of a multiple linear regression model with two independent 
variables and can thus be analyzed as such. Similar equivalence can be estab¬ 
lished between Equation (11.55) and a multiple linear regression model with 
five independent variables. 

Consider the exponential model given by Equation (11.54). Taking logar¬ 
ithms of both sides, we have 

InY = lnPo + Pix + E. (11.57) 

In terms of random variable In Y, Equation (11.57) represents a linear regres¬ 
sion equation with regression coefficients ln/7o and /3i. Linear regression tech¬ 
niques again apply in this case. Equation (11.56), however, cannot be conveniently 
put into a linear regression form. 

Example 11.7. Problem: on average, the rate of population increase (7) asso¬ 
ciated with a given city varies with x, the number of years after 1970. Assuming that 


£■{7} = Pq +Pxx +P2X^, 


compute the least-square estimates for Pq, Pi, and P 2 based on the data pre¬ 
sented in Table 11.5. 
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Table 11.5 Population increase, y, with 
number of years after 1970, x, for Example 11.7 


X 0 1 2 3 4 5 

y(%) 1.03 1.32 1.57 1.75 1.83 2.33 


Answer: let xi = x, X 2 = x^, and let 


0 = 


Po 

Pi 

Pi 


The least-square estimate for 6, 6, 


C = 


and 


y 


Thus 


given by Equation (11.49), with 

1 0 0 ■ 

1 1 1 
1 2 4 

1 3 9 

1 4 16 
1 5 25 


1.03 

1.32 
1.57 
1.75 
1.83 

2.33 



- 6 

15 

55 - 

-1 

■ 9.83' 

d={C^CY^C'^y = 

15 

55 

225 


28.68 


.55 

225 

979. 


. 110.88. 


■ 1 . 07 - 

0.20 

. 0 . 01 . 


or 

/3o = 1.07, /3i=0.20, and /32 = 0.01. 
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Let us note in this example that, since X2 = x j, matrix C is constrained in that 
its elements in the third column are the squared values of their corresponding 
elements in the second column. It needs to be cautioned that, for high-order 
polynomial regression models, constraints of this type may render matrix C^C 
ill-conditioned and lead to matrix-inversion difficulties. 


REFERENCE 
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Some additional useful references on regression analysis are given below. 
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Bendat, J.S., and Piersol, A.G., 1966, Measurement and Analysis ofRandomData, John 
Wiley & Sons Inc., New York. 

Draper, N., and Smith, H., 1966, Applied Regression Analysis, John Wiley & Sons Inc., 
New York. 

Graybill, F.A., 1961, An Introduction to Linear Statistical Models, Volume I. McGraw- 
Hill, New York. 


PROBLEMS 

11.1 A special case of simple linear regression is given by 

Y = f3x + E. 

Determine: 

(a) The least-square estimator Rfor /?; 

(b) The mean and variance of B; 

(c) An unbiased estimator for cr^, the variance of Y. 

11.2 In simple linear regression, show that the maximum likelihood estimators for a and 
/3 are identical to their least-square estimators when Y is normally distributed. 

11.3 Determine the maximum likelihood estimator for variance of Y in simple linear 
regression assuming that Y is normally distributed. Is it a biased estimator? 

11.4 Since data quality is generally not uniform among data points, it is sometimes 
desirable to estimate the regression coefficients by minimizing the sum of weighted 
squared residuals; that is, a and (3 in simple linear regression are found by minimizing 

/=i 
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where w/ are assigned weights. In vector-matrix notation, show that estimates 
a and /3 now take the form 


e = 


a 

A 


{C'^WCy'^C'^Wy, 


where 


W = 


W| 


W2 



11.5 (a) In simple linear regression [Equation (11.4)]. use vector-matrix notation and 
show that the unbiased estimator for given by Equation (11.33) can be 
written in the form 


S2=^[(F-C0)^(y-C0)l. 

(b) In multiple linear regression [Equation (11.46)], show that an unbiased esti¬ 
mator for y is given by 

=-i--[(F- C0)’^(F- C0)]. 

n — m — \ 

11.6 Given the data in Table 11.6: 

Table 11.6 Data for Problem 11.6 

X0123456789 
y 3.2 3.1 3.9 4.7 4.3 4.4 4.8 5.3 5.9 6.0 


(a) Determine the least-square estimates of a and [3 in the linear regression 
equation 

F = a + fix + E. 

(b) Determine an unbiased estimate of cr^, the variance of Y. 

(c) Estimate £{F } at X = 5. 

(d) Determine a 95% confidence interval for /?. 

(e) Determine a 95% confidence band for a + [5x. 

11.7 In transportation studies, it is assumed that, on average, peak vehicle noise level 
(F) is linearly related to the logarithm of vehicle speed (v). Some measurements 
taken for a class of light vehicles are given in Table 11.7. Assuming that 

Y = a + /llog|Q v + £, 
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Table 11.7 N oise level, y (in dB) with vehicle 
speed, V (in kmh^*), for Problem 11.7 


V 20 30 40 50 60 70 80 90 100 

y 55 63 68 70 72 78 74 76 79 


determine the estimated regression line for L as a function of logjQ v. 

11.8 An experimental study of nasal deposition of particles was carried out and 
showed a linear relationship between E{Y} and Xncfif, where Y is the fraction 
of particles of aerodynamic diameter, d (in mm), that is deposited in the nose 
during an inhalation of/(I min^*). Consider the data given in Table 11.8 (four 
readings are taken at each value of In d^f). Estimate the regression parameters in 
the linear regression equation 


E{Y} = a + pin d^f, 


and estimate cr^, the variance of Y. 


Table 11.8 Fraction of particles inhaled of diameter d (in mm), with In d^f 



(/■is 

inhalation. 

in 1 min 

), for Problem 

11.8 


Ind^f 

1.6 

1.7 

2.0 

2.8 

3.0 

3.0 

3.6 

y 

0.39 

0.41 

0.42 

0.61 

0.83 

0.79 

0.98 


0.30 

0.28 

0.34 

0.51 

0.79 

0.69 

0.88 


0.21 

0.20 

0.22 

0.47 

0.70 

0.63 

0.87 


0.12 

0.10 

0.18 

0.39 

0.61 

0.59 

0.83 


11.9 For a study of the stress-strain history of soft biological tissues, experimental 
results relating dynamic moduli of aorta (D) to stress frequency (cu) are given in 
Table 11.9. 

(a) Assuming that E{D} = a + Pto, and = a^, estimate regression coefficients 
a and p. 

(b) Determine a one-sided 95% confidence interval for the variance of D. 

(c) Test if the slope estimate is significantly different from zero at the 5% 
significance level. 

Table 11.9 The dynamic modulus of aorta, d (normalized) with frequency, 
u! (in Hz), for Problem 11.9 


0)123456789 10 

d 1.60 1.51 1.40 1.57 1.60 1.59 1.80 1.59 1.82 1.59 


11.10 Given the data in Table 11.10 

(a) Determine the least-square estimates of Pa, Pi, and P 2 assuming that 


E{Y} — Pa + PiX\ + P2X2- 
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Table 11.10 Data for Problem 11.10 


Xl 

-1 

-1 

1 

1 

2 

2 

3 

3 

X 2 

1 

2 

3 

4 

5 

6 

7 

8 

y 

2.0 

3.1 

4.8 

4.9 

5.4 

6.8 

6.9 

7.5 


(b) Estimate E{Y} ai xi = X 2 = 2. 

11.11 In Problem 11.7, when vehicle weight is taken into account, we have the multiple 
linear regression equation 

Y = I3q + Pi logio V + /?2 logio w + E, 

where w is vehicle unladen weight in Mg. Use the data given in Table 11.11 and 
estimate the regression parameters in this case. 


Table 11.11 Noise level, y (in dB), with vehicle weight (unladen, 
in Mg) and vehicle speed (in kmh^*), for Problem 11.11 


V 

20 

40 

60 

80 

100 

120 

w 

1.0 

1.0 

1.7 

3.0 

1.0 

0.7 


54 

59 

78 

91 

78 

67 


11.12 Given the data in Table 11.12: 

Table 11.12 Data for Problem 11.12 

V01234567 
y 3.2 2.8 5.1 7.3 7.6 5.9 4.1 1.8 


(a) Determine the least-square estimates of Pq, Pi, and P 2 assuming that 

E{Y} = Pq + Pix + P 2 X^. 

(b) Estimate E{Y} al x = 3. 

11.13 A large number of socioeconomic variables are important to account for mortal¬ 
ity rate. Assuming a multiple linear regression model, one version of the model for 
mortality rate (F) is expressed by 

Y = Pq PiXi -f P2X2 + P3X2 + P4X4 -f E, 


where 

xi = mean annual precipitation in inches, 

X 2 = education in terms of median school years completed for those over 25 years 
old 

V3 = percentage of area population that is nonwhite, 

X 4 = relative pollution potential of SO2 (sulfur dioxide). 
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Table 11.13 Data for Problem 11.13 


Xl 

13 

11 

21 

30 

35 

27 

27 

40 

X2 

9 

10.5 

11 

10 

9 

12.3 

9 

9 

X3 

1.5 

7 

21 

27 

30 

6 

27 

33 

X4 

4 

21 

64 

67 

17 

28 

82 

101 

y 

795 

841 

820 

1050 

1010 

970 

980 

1090 


Some available data are presented in Table 11.13. Determine the least-square 
estimate of the regression parameters. 
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A.l BINOMIAL MASS FUNCTION 


Table A.l Binomial mass function: a table of 

Pxik) = Q/(i 

for w = 2 to 10, p = 0.01 to 0.50 


n k ^ 




0.01 

0.05 

0.10 

0.15 

0.20 

0.25 

0.30 

1 

3 

0.35 

0.40 

0.45 

0.49 

0.50 

2 

0 

0.9801 

0.9025 

0.8100 

0.7225 

0.6400 

0.5625 

0.4900 

0.4444 

0.4225 

0.3600 

0.3025 

0.2601 

0.2500 


1 

0.0198 

0.0950 

0.1800 

0.2550 

0.3200 

0.3750 

0.4200 

0.4444 

0.4550 

0.4800 

0.4950 

0.4998 

0.5000 


2 

0.0001 

0.0025 

0.0100 

0.0225 

0.0400 

0.0625 

0.0900 

0.1111 

0.1225 

0.1600 

0.2025 

0.2401 

0.2500 

3 

0 

0.9703 

0.8574 

0.7290 

0.6141 

0.5120 

0.4219 

0.3430 

0.2963 

0.2746 

0.2160 

0.1664 

0.1327 

0.1250 


1 

0.0294 

0.1354 

0.2430 

0.3251 

0.3840 

0.4219 

0.4410 

0.4444 

0.4436 

0.4320 

0.4084 

0.3823 

0.3750 


2 

0.0003 

0.0071 

0.0270 

0.0574 

0.0960 

0.1406 

0.1890 

0.2222 

0.2389 

0.2880 

0.3341 

0.3674 

0.3750 


3 

0.0000 

0.0001 

0.0010 

0.0034 

0.0080 

0.0156 

0.0270 

0.0370 

0.0429 

0.0640 

0.0911 

0.1176 

0.1250 

4 

0 

0.9606 

0.8145 

0.6561 

0.5220 

0.4096 

0.3164 

0.2401 

0.1975 

0.1785 

0.1296 

0.0915 

0.0677 

0.0625 


1 

0.0388 

0.1715 

0.2916 

0.3685 

0.4096 

0.4219 

0.4116 

0.3951 

0.3845 

0.3456 

0.2995 

0.2600 

0.2500 


2 

0.0006 

0.0135 

0.0486 

0.0975 

0.1536 

0.2109 

0.2646 

0.2963 

0.3105 

0.3456 

0.3675 

0.3747 

0.3750 


3 

0.0000 

0.0005 

0.0036 

0.0115 

0.0256 

0.0469 

0.0756 

0.0988 

0.1115 

0.1536 

0.2005 

0.2400 

0.2500 


4 

0.0000 

0.0000 

0.0001 

0.0005 

0.0016 

0.0039 

0.0081 

0.0123 

0.0150 

0.0256 

0.0410 

0.0576 

0.0625 

5 

0 

0.9510 

0.7738 

0.5905 

0.4437 

0.3277 

0.2373 

0.1681 

0.1317 

0.1160 

0.0778 

0.0503 

0.0345 

0.0312 


1 

0.0480 

0.2036 

0.3280 

0.3915 

0.4096 

0.3955 

0.3602 

0.3292 

0.3124 

0.2592 

0.2059 

0.1657 

0.1562 


2 

0.0010 

0.0214 

0.0729 

0.1382 

0.2048 

0.2637 

0.3087 

0.3292 

0.3364 

0.3456 

0.3369 

0.3185 

0.3125 


3 

0.0000 

0.0011 

0.0081 

0.0244 

0.0512 

0.0879 

0.1323 

0.1646 

0.1811 

0.2304 

0.2757 

0.3060 

0.3125 


4 

0.0000 

0.0000 

0.0004 

0.0022 

0.0064 

0.0146 

0.0284 

0.0412 

0.0488 

0.0768 

0.1128 

0.1470 

0.1562 


5 

0.0000 

0.0000 

0.0000 

0.0001 

0.0003 

0.0010 

0.0024 

0.0041 

0.0053 

0.0102 

0.0185 

0.0283 

0.0312 

6 

0 

0.9415 

0.7351 

0.5314 

0.3771 

0.2621 

0.1780 

0.1176 

0.0878 

0.0754 

0.0467 

0.0277 

0.0176 

0.0156 


1 

0.0571 

0.2321 

0.3543 

0.3993 

0.3932 

0.3560 

0.3025 

0.2634 

0.2437 

0.1866 

0.1359 

0.1014 

0.0938 


2 

0.0014 

0.0305 

0.0984 

0.1762 

0.2458 

0.2966 

0.3241 

0.3292 

0.3280 

0.3110 

0.2780 

0.2437 

0.2344 


3 

0.0000 

0.0021 

0.0146 

0.0415 

0.0819 

0.1318 

0.1852 

0.2195 

0.2355 

0.2765 

0.3032 

0.3121 

0.3125 
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Table A.l Continued 


n 

k 







P 







0.01 

0.05 

0.10 

0.15 

0.20 

0.25 

0.30 

1 

3 

0.35 

0.40 

0.45 

0.49 

0.50 


4 

0.0000 

0.0001 

0.0012 

0.0055 

0.0154 

0.0330 

0.0595 

0.0823 

0.0951 

0.1382 

0.1861 

0.2249 

0.2344 


5 

0.0000 

0.0000 

0.0001 

0.0004 

0.0015 

0.0044 

0.0102 

0.0165 

0.0205 

0.0369 

0.0609 

0.0864 

0.0938 


6 

0.0000 

0.0000 

0.0000 

0.0000 

0.0001 

0.0002 

0.0007 

0.0014 

0.0018 

0.0041 

0.0083 

0.0139 

0.0156 

1 

0 

0.9321 

0.6983 

0.4783 

0.3206 

0.2097 

0.1335 

0.0824 

0.0585 

0.0490 

0.0280 

0.0152 

0.0090 

0.0078 


1 

0.0659 

0.2573 

0.3720 

0.3960 

0.3670 

0.3115 

0.2471 

0.2048 

0.1848 

0.1306 

0.0872 

0.0603 

0.0547 


2 

0.0020 

0.0406 

0.1240 

0.2097 

0.2753 

0.3115 

0.3177 

0.3073 

0.2985 

0.2613 

0.2140 

0.1740 

0.1641 


3 

0.0000 

0.0036 

0.0230 

0.0617 

0.1147 

0.1730 

0.2269 

0.2561 

0.2679 

0.2903 

0.2918 

0.2786 

0.2734 


4 

0.0000 

0.0002 

0.0026 

0.0109 

0.0287 

0.0577 

0.0972 

0.1280 

0.1442 

0.1935 

0.2388 

0.2676 

0.2734 


5 

0.0000 

0.0000 

0.0002 

0.0012 

0.0043 

0.0115 

0.0250 

0.0384 

0.0466 

0.0774 

0.1172 

0.1543 

0.1641 


6 

0.0000 

0.0000 

0.0000 

0.0001 

0.0004 

0.0013 

0.0036 

0.0064 

0.0084 

0.0172 

0.0320 

0.0494 

0.0547 


7 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0001 

0.0002 

0.0005 

0.0006 

0.0016 

0.0037 

0.0068 

0.0078 

8 

0 

0.9227 

0.6634 

0.4305 

0.2725 

0.1678 

0.1001 

0.0576 

0.0390 

0.0319 

0.0168 

0.0084 

0.0046 

0.0039 


1 

0.0746 

0.2793 

0.3826 

0.3847 

0.3355 

0.2670 

0.1977 

0.1561 

0.1373 

0.0896 

0.0548 

0.0352 

0.0312 


2 

0.0026 

0.0515 

0.1488 

0.2376 

0.2936 

0.3115 

0.2965 

0.2731 

0.2587 

0.2090 

0.1569 

0.1183 

0.1094 


3 

0.0001 

0.0054 

0.0331 

0.0839 

0.1468 

0.2076 

0.2541 

0.2731 

0.2786 

0.2787 

0.2568 

0.2273 

0.2188 


4 

0.0000 

0.0004 

0.0046 

0.0185 

0.0459 

0.0865 

0.1361 

0.1707 

0.1875 

0.2322 

0.2627 

0.2730 

0.2734 


5 

0.0000 

0.0000 

0.0004 

0.0026 

0.0092 

0.0231 

0.0467 

0.0683 

0.0808 

0.1239 

0.1719 

0.2098 

0.2188 


6 

0.0000 

0.0000 

0.0000 

0.0002 

0.0011 

0.0038 

0.0100 

0.0171 

0.0217 

0.0413 

0.0703 

0.1008 

0.1094 


7 

0.0000 

0.0000 

0.0000 

0.0000 

0.0001 

0.0004 

0.0012 

0.0024 

0.0033 

0.0079 

0.0164 

0.0277 

0.0312 


8 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0001 

0.0002 

0.0002 

0.0007 

0.0017 

0.0033 

0.0039 

9 

0 

0.9135 

0.6302 

0.3874 

0.2316 

0.1342 

0.0751 

0.0404 

0.0260 

0.0207 

0.0101 

0.0046 

0.0023 

0.0020 


1 

0.0830 

0.2985 

0.3874 

0.3679 

0.3020 

0.2253 

0.1556 

0.1171 

0.1004 

0.0605 

0.0339 

0.0202 

0.0176 


2 

0.0034 

0.0629 

0.1722 

0.2597 

0.3020 

0.3003 

0.2668 

0.2341 

0.2162 

0.1612 

0.1110 

0.0776 

0.0703 


3 

0.0001 

0.0077 

0.0446 

0.1069 

0.1762 

0.2336 

0.2668 

0.2731 

0.2716 

0.2508 

0.2119 

0.1739 

0.1641 


4 

0.0000 

0.0006 

0.0074 

0.0283 

0.0661 

0.1168 

0.1715 

0.2048 

0.2194 

0.2508 

0.2600 

0.2506 

0.2461 


5 

0.0000 

0.0000 

0.0008 

0.0050 

0.0165 

0.0389 

0.0735 

0.1024 

0.1181 

0.1672 

0.2128 

0.2408 

0.2461 


6 

0.0000 

0.0000 

0.0001 

0.0006 

0.0028 

0.0087 

0.0210 

0.0341 

0.0424 

0.0743 

0.1160 

0.1542 

0.1641 


7 

0.0000 

0.0000 

0.0000 

0.0000 

0.0003 

0.0012 

0.0039 

0.0073 

0.0098 

0.0212 

0.0407 

0.0635 

0.0703 


8 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0001 

0.0004 

0.0009 

0.0013 

0.0035 

0.0083 

0.0153 

0.0176 


9 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0001 

0.0001 

0.0003 

0.0008 

0.0016 

0.0020 

10 

0 

0.9044 

0.5987 

0.3487 

0.1969 

0.1074 

0.0563 

0.0282 

0.0173 

0.0135 

0.0060 

0.0025 

0.0012 

0.0010 


1 

0.0914 

0.3151 

0.3874 

0.3474 

0.2684 

0.1877 

0.1211 

0.0867 

0.0725 

0.0403 

0.0207 

0.0114 

0.0098 


2 

0.0042 

0.0746 

0.1937 

0.2759 

0.3020 

0.2816 

0.2335 

0.1951 

0.1757 

0.1209 

0.0736 

0.0495 

0.0439 


3 

0.0001 

0.0105 

0.0574 

0.1298 

0.2013 

0.2503 

0.2668 

0.2601 

0.2522 

0.2150 

0.1665 

0.1267 

0.1172 


4 

0.0000 

0.0010 

0.0112 

0.0401 

0.0881 

0.1460 

0.2001 

0.2276 

0.2377 

0.2508 

0.2384 

0.2130 

0.2051 


5 

0.0000 

0.0001 

0.0015 

0.0085 

0.0264 

0.0584 

0.1029 

0.1366 

0.1536 

0.2007 

0.2340 

0.2456 

0.2461 


6 

0.0000 

0.0000 

0.0001 

0.0012 

0.0055 

0.0162 

0.0368 

0.0569 

0.0689 

0.1115 

0.1596 

0.1966 

0.2051 


7 

0.0000 

0.0000 

0.0000 

0.0001 

0.0008 

0.0031 

0.0090 

0.0163 

0.0212 

0.0425 

0.0746 

0.1080 

0.1172 


8 

0.0000 

0.0000 

0.0000 

0.0000 

0.0001 

0.0004 

0.0014 

0.0030 

0.0043 

0.0106 

0.0229 

0.0389 

0.0439 


9 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0001 

0.0003 

0.0005 

0.0016 

0.0042 

0.0083 

0.0098 


10 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0001 

0.0003 

0.0008 

0.0010 


From Parzen , E ., 1960 , Modern Probability Theory and Its Applications , John Wiley & Sons , with permission . 


TLFeBOOK 



Appendix A: Tables 


367 


A.2 POISSON MASS FUNCTION 


Table A.2 Poisson mass function: a table of 


a(o, 0 


k\ 


for A: = 0 to 24, At = 0.1 to 10 


\t 







k 








0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

0.1 

0.9048 

0.0905 

0.0045 

0.0002 

0.0000 









0.2 

0.8187 

0.1637 

0.0164 

0.0011 

0.0001 

0.0000 








0.3 

0.7408 

0.2222 

0.0333 

0.0033 

0.0002 

0.0000 








0.4 

0.6703 

0.2681 

0.0536 

0.0072 

0.0007 

0.0001 

0.0000 







0.5 

0.6065 

0.3033 

0.0758 

0.0126 

0.0016 

0.0002 

0.0000 







0.6 

0.5488 

0.3293 

0.0988 

0.0198 

0.0030 

0.0004 

0.0000 







0.7 

0.4966 

0.3476 

0.1217 

0.0284 

0.0050 

0.0007 

0.0001 

0.0000 






0.8 

0.4493 

0.3595 

0.1438 

0.0383 

0.0077 

0.0012 

0.0002 

0.0000 






0.9 

0.4066 

0.3659 

0.1647 

0.0494 

0.0111 

0.0020 

0.0003 

0.0000 






1.0 

0.3679 

0.3679 

0.1839 

0.0613 

0.0153 

0.0031 

0.0005 

0.0001 

0.0000 





1.1 

0.3329 

0.3662 

0.2014 

0.0738 

0.0203 

0.0045 

0.0008 

0.0001 

0.0000 





1.2 

0.3012 

0.3614 

0.2169 

0.0867 

0.0260 

0.0062 

0.0012 

0.0002 

0.0000 





1.3 

0.2725 

0.3543 

0.2303 

0.0998 

0.0324 

0.0084 

0.0018 

0.0003 

0.0001 

0.0000 




1.4 

0.2466 

0.3452 

0.2417 

0.1128 

0.0395 

0.0111 

0.0026 

0.0005 

0.0001 

0.0000 




1.5 

0.2231 

0.3347 

0.2510 

0.1255 

0.0471 

0.0141 

0.0035 

0.0008 

0.0001 

0.0000 




1.6 

0.2019 

0.3230 

0.2584 

0.1378 

0.0551 

0.0176 

0.0047 

0.0011 

0.0002 

0.0000 




1.7 

0.1827 

0.3106 

0.2640 

0.1496 

0.0636 

0.0216 

0.0061 

0.0015 

0.0003 

0.0001 

0.0000 



1.8 

0.1653 

0.2975 

0.2678 

0.1607 

0.0723 

0.0260 

0.0078 

0.0020 

0.0005 

0.0001 

0.0000 



1.9 

0.1496 

0.2842 

0.2700 

0.1710 

0.0812 

0.0309 

0.0098 

0.0027 

0.0006 

0.0001 

0.0000 



2.0 

0.1353 

0.2707 

0.2707 

0.1804 

0.0902 

0.0361 

0.0120 

0.0034 

0.0009 

0.0002 

0.0000 



2.2 

0.1108 

0.2438 

0.2681 

0.1966 

0.1082 

0.0476 

0.0174 

0.0055 

0.0015 

0.0004 

0.0001 

0.0000 


2.4 

0.0907 

0.2177 

0.2613 

0.2090 

0.1254 

0.0602 

0.0241 

0.0083 

0.0025 

0.0007 

0.0002 

0.0000 


2.6 

0.0743 

0.1931 

0.2510 

0.2176 

0.1414 

0.0735 

0.0319 

0.0118 

0.0038 

0.0011 

0.0003 

0.0001 

0.0000 

2.8 

0.0608 

0.1703 

0.2384 

0.2225 

0.1557 

0.0872 

0.0407 

0.0163 

0.0057 

0.0018 

0.0005 

0.0001 

0.0000 

3.0 

0.0498 

0.1494 

0.2240 

0.2240 

0.1680 

0.1008 

0.0504 

0.0216 

0.0081 

0.0027 

0.0008 

0.0002 

0.0001 

3.2 

0.0408 

0.1304 

0.2087 

0.2226 

0.1781 

0.1140 

0.0608 

0.0278 

0.0111 

0.0040 

0.0013 

0.0004 

0.0001 

3.4 

0.0334 

0.1135 

0.1929 

0.2186 

0.1858 

0.1264 

0.0716 

0.0348 

0.0148 

0.0056 

0.0019 

0.0006 

0.0002 

3.6 

0.0273 

0.0984 

0.1771 

0.2125 

0.1912 

0.1377 

0.0826 

0.0425 

0.0191 

0.0076 

0.0028 

0.0009 

0.0003 

3.8 

0.0224 

0.0850 

0.1615 

0.2046 

0.1944 

0.1477 

0.0936 

0.0508 

0.0241 

0.0102 

0.0039 

0.0013 

0.0004 

4.0 

0.0183 

0.0733 

0.1465 

0.1954 

0.1954 

0.1563 

0.1042 

0.0595 

0.0298 

0.0132 

0.0053 

0.0019 

0.0006 

5.0 

0.0067 

0.0337 

0.0842 

0.1404 

0.1755 

0.1755 

0.1462 

0.1044 

0.0653 

0.0363 

0.0181 

0.0082 

0.0034 

6.0 

0.0025 

0.0149 

0.0446 

0.0892 

0.1339 

0.1606 

0.1606 

0.1377 

0.1033 

0.0688 

0.0413 

0.0225 

0.0113 

7.0 

0.0009 

0.0064 

0.0223 

0.0521 

0.0912 

0.1277 

0.1490 

0.1490 

0.1304 

0.1014 

0.0710 

0.0452 

0.0264 

8.0 

0.0003 

0.0027 

0.0107 

0.0286 

0.0573 

0.0916 

0.1221 

0.1396 

0.1396 

0.1241 

0.0993 

0.0722 

0.0481 

9.0 

0.0001 

0.0011 

0.0050 

0.0150 

0.0337 

0.0607 

0.0911 

0.1171 

0.1318 

0.1318 

0.1186 

0.0970 

0.0728 

10.0 

0.0000 

0.0005 

0.0023 

0.0076 

0.0189 

0.0378 

0.0631 

0.0901 

0.1126 

0.1251 

0.1251 

0.1137 

0.0948 
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Table A.2 Continued 


\t k 



13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

5.0 

0.0013 

0.0005 

0.0002 










6.0 

0.0052 

0.0022 

0.0009 

0.0003 

0.0001 








7.0 

0.0142 

0.0071 

0.0033 

0.0014 

0.0006 

0.0002 

0.0001 






8.0 

0.0296 

0.0169 

0.0090 

0.0045 

0.0021 

0.0009 

0.0004 

0.0002 

0.0001 




9.0 

0.0504 

0.0324 

0.0194 

0.0109 

0.0058 

0.0029 

0.0014 

0.0006 

0.0003 

0.0001 



10.0 

0.0729 

0.0521 

0.0347 

0.0217 

0.0128 

0.0071 

0.0037 

0.0019 

0.0009 

0.0004 

0.0002 

0.0001 


From Parzen , E ., 1960 , Modern Probability and Its Applications , John Wiley & Sons , with permission . 
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A.3 STANDARDIZED NORMAL DISTRIBUTION FUNCTION 


Table A.3 Standardized normal distribution function: a table of 


for u = 0.0 to 3.69 


u 

0.00 

0.01 

0.02 

0.03 

0.04 

0.05 

0.06 

0.07 

0.08 

0.09 

0.0 

0.5000 

0.5040 

0.5080 

0.5120 

0.5160 

0.5199 

0.5239 

0.5279 

0.5319 

0.5359 

0.1 

0.5398 

0.5438 

0.5478 

0.5517 

0.5557 

0.5596 

0.5636 

0.5675 

0.5714 

0.5733 

0.2 

0.5793 

0.5832 

0.5871 

0.5910 

0.5948 

0.5987 

0.6026 

0.6064 

0.6103 

0.6141 

0.3 

0.6179 

0.6217 

0.6255 

0.6293 

0.6331 

0.6368 

0.6406 

0.6443 

0.6480 

0.6517 

0.4 

0.6554 

0.6591 

0.6628 

0.6664 

0.6700 

0.6736 

0.6772 

0.6808 

0.6844 

0.6879 

0.5 

0.6915 

0.6950 

0.6985 

0.7019 

0.7054 

0.7088 

0.7123 

0.7157 

0.7190 

0.7224 

0.6 

0.7257 

0.7291 

0.7324 

0.7357 

0.7389 

0.7422 

0.7454 

0.7486 

0.7517 

0.7549 

0.7 

0.7580 

0.7611 

0.7642 

0.7673 

0.7704 

0.7734 

0.7764 

0.7794 

0.7823 

0.7852 

0.8 

0.7881 

0.7910 

0.7939 

0.7967 

0.7995 

0.8023 

0.8051 

0.8078 

0.8106 

0.8133 

0.9 

0.8159 

0.8186 

0.8212 

0.8238 

0.8264 

0.8289 

0.8315 

0.8340 

0.8365 

0.8389 

1.0 

0.8413 

0.8438 

0.8461 

0.8485 

0.8508 

0.8531 

0.8554 

0.8577 

0.8599 

0.8621 

1.1 

0.8643 

0.8665 

0.8686 

0.8708 

0.8729 

0.8749 

0.8770 

0.8790 

0.8810 

0.8830 

1.2 

0.8849 

0.8869 

0.8888 

0.8907 

0.8925 

0.8944 

0.8962 

0.8980 

0.8997 

0.9015 

1.3 

0.9032 

0.9049 

0.9066 

0.9082 

0.9099 

0.9115 

0.9131 

0.9147 

0.9162 

0.9177 

1.4 

0.9192 

0.9207 

0.9222 

0.9236 

0.9251 

0.9265 

0.9279 

0.9292 

0.9306 

0.9319 

1.5 

0.9332 

0.9345 

0.9357 

0.9370 

0.9382 

0.9394 

0.9406 

0.9418 

0.9429 

0.9441 

1.6 

0.9452 

0.9463 

0.9474 

0.9484 

0.9495 

0.9505 

0.9515 

0.9525 

0.9535 

0.9545 

1.7 

0.9554 

0.9564 

0.9573 

0.9482 

0.9591 

0.9599 

0.9608 

0.9616 

0.9625 

0.9633 

1.8 

0.9641 

0.9649 

0.9656 

0.9664 

0.9671 

0.9678 

0.9686 

0.9693 

0.9699 

0.9706 

1.9 

0.9713 

0.9719 

0.9726 

0.9732 

0.9738 

0.9744 

0.9750 

0.9756 

0.9761 

0.9767 

2.0 

0.9772 

0.9778 

0.9783 

0.9788 

0.9793 

0.9798 

0.9803 

0.9808 

0.9812 

0.9817 

2.1 

0.9821 

0.9826 

0.9830 

0.9834 

0.9838 

0.9842 

0.9846 

0.9850 

0.9854 

0.9857 

2.2 

0.9861 

0.9864 

0.9868 

0.9871 

0.9875 

0.9878 

0.9881 

0.9884 

0.9887 

0.9890 

2.3 

0.9893 

0.9896 

0.9898 

0.9901 

0.9904 

0.9906 

0.9909 

0.9911 

0.9913 

0.9916 

2.4 

0.9918 

0.9920 

0.9922 

0.9925 

0.9927 

0.9929 

0.9931 

0.9932 

0.9934 

0.9936 

2.5 

0.9938 

0.9940 

0.9941 

0.9943 

0.9945 

0.9946 

0.9948 

0.9949 

0.9951 

0.9952 

2.6 

0.9953 

0.9955 

0.9956 

0.9957 

0.9959 

0.9960 

0.9961 

0.9962 

0.9963 

0.9964 

2.7 

0.9965 

0.9966 

0.9967 

0.9968 

0.9969 

0.9970 

0.9971 

0.9972 

0.9973 

0.9974 

2.8 

0.8874 

0.9975 

0.9976 

0.9977 

0.9977 

0.9978 

0.9979 

0.9979 

0.9980 

0.9981 

2.9 

0.9981 

0.9982 

0.9982 

0.9983 

0.9984 

0.9984 

0.9985 

0.9985 

0.9986 

0.9986 

3.0 

0.9987 

0.9987 

0.9987 

0.9988 

0.9988 

0.9989 

0.9989 

0.9989 

0.9990 

0.9990 

3.1 

0.9990 

0.9991 

0.9991 

0.9991 

0.9992 

0.9992 

0.9992 

0.9992 

0.9993 

0.9993 

3.2 

0.9993 

0.9993 

0.9994 

0.9994 

0.9994 

0.9994 

0.9994 

0.9995 

0.9995 

0.9995 

3.3 

0.9995 

0.9995 

0.9995 

0.9996 

0.9996 

0.9996 

0.9996 

0.9996 

0.9996 

0.9997 

3.4 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9997 

0.9998 

3.6 

0.9998 

0.9998 

0.9999 

0.9999 

0.9999 

0.9999 

0.9999 

0.9999 

0.9999 

0.9999 


From Parzen , E ., 1960 , Modern Probability and Its Applications , John Wiley & Sons , with permission . 
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A.4 STUDENT’S t DISTRIBUTION WITH n DEGREES OF FREEDOM 


Table A.4 Student’s distribution with n degrees of 
Freedom: a table of in P{T > t„^a) = a, for a = 0.005 
to 0.10, n= 1,2,... 


n 



a 




0.10 

0.05 

0.025 

0.01 

0.005 

1 

3.078 

6.314 

12.706 

31.821 

63.657 

2 

1.886 

2.920 

4.303 

6.965 

9.925 

3 

1.638 

2.353 

3.182 

4.541 

5.841 

4 

1.533 

2.132 

2.776 

3.747 

4.604 

5 

1.476 

2.015 

2.571 

3.365 

4.032 

6 

1.440 

1.943 

2.447 

3.143 

3.707 

7 

1.415 

1.895 

2.365 

2.998 

3.499 

8 

1.397 

1.860 

2.306 

2.896 

3.355 

9 

1.383 

1.833 

2.262 

2.821 

3.250 

10 

1.372 

1.812 

2.228 

2.764 

3.169 

11 

1.363 

1.796 

2.201 

2.718 

3.106 

12 

1.356 

1.782 

2.179 

2.681 

3.055 

13 

1.350 

1.771 

2.160 

2.650 

3.012 

14 

1.345 

1.761 

2.145 

2.624 

2.977 

15 

1.341 

1.753 

2.131 

2.602 

2.947 

16 

1.337 

1.746 

2.120 

2.583 

2.921 

17 

1.333 

1.740 

2.110 

2.567 

2.898 

18 

1.330 

1.734 

2.101 

2.552 

2.878 

19 

1.328 

1.729 

2.093 

2.539 

2.861 

20 

1.325 

1.725 

2.086 

2.528 

2.845 

21 

1.323 

1.721 

2.080 

2.518 

2.831 

22 

1.321 

1.717 

2.074 

2.508 

2.819 

23 

1.319 

1.714 

2.069 

2.500 

2.807 

24 

1.318 

1.711 

2.064 

2.492 

2.979 

25 

1.316 

1.708 

2.060 

2.485 

2.787 

26 

1.315 

1.706 

2.056 

2.479 

2.779 

27 

1.314 

1.703 

2.052 

2.473 

2.771 

28 

1.313 

1.701 

2.048 

2.467 

2.763 

29 

1.311 

1.699 

2.045 

2.462 

2.756 

00 

1.282 

1.645 

1.960 

2.326 

2.576 


From Fisher , R . A ., 1925 , Statistical Methods for Research Workers , 
14 th edn , Hafner Press . Reproduced by permission of The University of 
Adelaide , Australia . 
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A.5 CHI-SQUARED DISTRIBUTION WITH n DEGREES OF FREEDOM 


Table A.5 Chi-squared distribution with n degrees of freedom: a table of in 
/■(D > x^,q) = OL, for a = 0.005 to 0.995, « = 1 to 30 


n a 



0.995 

0.99 

0.975 

0.95 

0.05 

0.025 

0.01 

0.005 

1 

0.0‘*393 

0.0^157 

0.0^982 

0.0^393 

3.841 

5.024 

6.635 

7.879 

2 

0.0100 

0.0201 

0.0506 

0.103 

5.991 

131?, 

9.210 

10.597 

3 

0.717 

0.115 

0.216 

0.352 

7.815 

9.348 

11.346 

12.838 

4 

0.207 

0.297 

0.484 

0.711 

9.488 

11.143 

13.277 

14.860 

5 

0.412 

0.554 

0.831 

1.145 

11.070 

12.832 

15.086 

16.750 

6 

0.676 

0.872 

1.237 

1.635 

12.592 

14.449 

16.812 

18.548 

7 

0.989 

1.239 

1.690 

2.167 

14.067 

16.013 

18.475 

20.278 

8 

1.344 

1.646 

2.180 

2.733 

15.507 

17.535 

20.090 

21.955 

9 

1.735 

2.088 

2.700 

3.325 

16.919 

19.023 

21.666 

23.589 

10 

2.156 

2.558 

3.247 

3.940 

18.307 

20.483 

23.209 

25.188 

11 

2.603 

3.053 

3.816 

4.575 

19.675 

21.920 

24.725 

26.757 

12 

3.074 

3.571 

4.404 

5.226 

21.026 

23.337 

26.217 

28.300 

13 

3.565 

4.107 

5.009 

5.892 

22.362 

24.736 

27.688 

29.819 

14 

4.075 

4.660 

5.628 

6.571 

23.685 

26.119 

29.141 

31.319 

15 

4.601 

5.229 

6.262 

7.261 

24.996 

27.488 

30.578 

32.801 

16 

5.142 

5.812 

6.908 

7.962 

26.296 

28.845 

32.000 

34.267 

17 

5.697 

6.408 

7.564 

8.672 

27.587 

30.191 

33.409 

35.718 

18 

6.265 

7.015 

8.231 

9.390 

28.869 

31.526 

34.805 

37.156 

19 

6.844 

7.633 

8.907 

10.117 

30.144 

32.852 

36.191 

38.582 

20 

7.434 

8.260 

9.591 

10.851 

31.410 

34.170 

37.566 

39.997 

21 

8.034 

8.897 

10.283 

11.591 

32.671 

35.479 

38.932 

41.401 

22 

8.643 

9.542 

10.982 

12.338 

33.924 

36.781 

40.289 

42.796 

23 

9.260 

10.196 

11.689 

13.091 

35.172 

38.076 

41.638 

44.181 

24 

9.886 

10.856 

12.401 

13.848 

36.415 

39.364 

42.980 

45.558 

25 

10.520 

11.524 

13.120 

14.611 

37.652 

40.646 

44.314 

46.928 

26 

11.160 

12.198 

13.844 

15.379 

38.885 

41.923 

45.642 

48.290 

27 

11.808 

12.879 

14.573 

16.151 

40.113 

43.194 

46.963 

49.645 

28 

12.461 

13.565 

15.308 

16.928 

41.337 

44.461 

48.278 

50.993 

29 

13.121 

14.256 

16.047 

17.708 

42.557 

45.722 

49.588 

52.336 

30 

13.787 

14.953 

16.791 

18.493 

43.773 

46.979 

50.892 

53.672 

From 

Pearson, E.S. 

and Hartley, 

H.O., 1954, 

Biometrika 

Tables for Statisticians, 

Volume 1, 

Cambridge 


University Press, with permission. 
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A.6 D 2 DISTRIBUTION WITH SAMPLE SIZE n 


Table A.6 D 2 distribution with sample size n: a table of in 
P(D 2 > c„,q) = a, for a = 0.01 to 0.10, n = 5,10,... 


n 


a 



0.10 

0.05 

0.01 

5 

0.51 

0.56 

0.67 

10 

0.37 

0.41 

0.49 

15 

0.30 

0.34 

0.40 

20 

0.26 

0.29 

0.35 

25 

0.24 

0.26 

0.32 

30 

0.22 

0.24 

0.29 

40 

0.19 

0.21 

0.25 


1.22 

1.36 

1.63 

Large n 


\/n 

\/n 


From Lindgren, B.W., 1962, Statistical Theory, Macmillan, with permission. 
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Appendix B: Computer Software 


A large number of computer software packages and spreadsheets are now 
available that can be used to generate probabilities such as those provided in 
Tables A.1-A.6 as well as to perform other statistical calculations. For exam¬ 
ple, some statistical functions available in Microsoft® ExceF"^ 2000 are listed 
below, which can be used to carry out many probability calculations and to do 
many exercises in the text. 

AVEDEV: gives the average of the absolute deviations of data points from 
their mean 

AVERAGE: gives the average of its arguments 

AVERAGEA: gives the average of its arguments, including numbers, text, and 
logical values 

BETADIST: gives the beta probability distribution function 
BETAINV: gives the inverse of the beta probability distribution function 
BINOMDIST: gives the individual term binomial probability 
CEIIDIST: gives the one-tailed probability of the Chi-squared distribution 
CHllNV: gives the inverse of the one-tailed probability of the Chi-squared 
distribution 

CHITEST: gives the test for independence 

CONEIDENCE: gives the confidence interval for a population mean 
CORREL: gives the correlation coefficient between two data sets 
COUNT: counts how many numbers are in the list of arguments 
COUNTA: counts how many values are in the list of arguments 
COVAR: gives covariance, the average of the products of paired deviations 
CRITBINOM: gives the smallest value for which the binomial distribution 
function is less than or equal to the criterion value 
DEVSQ: gives the sum of squares of deviations 
EXPONDIST: gives the exponential distribution 
EORECAST: gives a value along a linear trend 
EREQUENCY: gives a frequency distribution as a vertical array 
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GAMMADIST: gives the gamma distribution 

GAMMAINV: gives the inverse of the gamma distribution function 

GAMMALN: gives the natural logarithm of the gamma function 

GEOMEAN: gives the geometric mean 

GROWTEI: gives values along an exponential trend 

HYPGEOMDIST: gives the hypergeometric distribution 

INTERCEPT: gives the intercept of the linear regression line 

KURT: gives the kurtosis of a data set 

LARGE: gives the /cth largest value in a data set 

LINEST: gives the parameters of a linear trend 

LOGEST: gives the parameters of an exponential trend 

LOGINV: gives the inverse of the lognormal distribution 

LOGNORMDIST: gives the lognormal distribution function 

MAX: gives the maximum value in a list of arguments 

MAXA: gives the maximum value in a list of arguments, including numbers, 
text, and logical values 

MEDIAN: gives the median of the given numbers 
MIN: gives the minimum value in a list of arguments 

MINA: gives the smallest value in a list of arguments, including numbers, text, 
and logical values 

MODE: gives the most common value in a data set 
NEGBINOMDIST: gives the negative binomial distribution 
NORMDIST: gives the normal distribution function 
NORMINV: gives the inverse of the normal distribution function 
NORMSDIST: gives the standardized normal distribution function 
NORMSINV: gives the inverse of the standardized normal distribution function 
PERCENTILE: gives the Mh percentile of values in a range 
PERCENTRANK: gives the percentage rank of a value in a data set 
PERMUT: gives the number of permutations for a given number of objects 
POISSON: gives the Poisson distribution 

PROB: gives the probability that values in a range are between two limits 
QUARTILE: gives the quartile of a data set 
RANK: gives the rank of a number in a list of numbers 
SKEW: gives the skewness of a distribution 
SLOPE: gives the slope of the linear regression line 
SMALL: gives the Mh smallest value in a data set 
STANDARDIZE: gives a normalized value 
STDEV: estimates standard deviation based on a sample 
STDEVA: estimates standard deviation based on a sample, including numbers, 
text, and logical values 

STDEVP: calculates standard deviation based on the entire population 
STDEVPA: calculates standard deviation based on the entire population, 
including numbers, text, and logical values 
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STEYX: gives the standard error of the predicted y value for each x in the 
regression 

TDIST: gives the Student’s t-distribution 
TINV: gives the inverse of the Student’s t-distribution 
TREND: gives values along a linear trend 
TRIMMEAN: gives the mean of the interior of a data set 
TTEST: gives the probability associated with a Student’s t-test 
VAR: estimates variance based on a sample 

VARA: estimates variance based on a sample, including numbers, text, and 
logical values 

VARP: calculates variance based on the entire population 
VARPA: calculates variance based on the entire population, including num¬ 
bers, text, and logical values 
WEIBUEL: gives the Weibull distribution 
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Problems 


CHAPTER 2 

2.1 (a) Incorrect, (b) Correct, (c) Correct, (d) Correct, (e) Correct, 

(f) Correct 

2.4 (a) {1,2,..., 10}, (b) {1,3,4, 5,6}, (c){2,7}, (d) {2,4,6,7,8,9,10}, 
(eHl,^..., 10},__(f) {1,3,4J^6}, _(g)_{l, 5}__ 

2.7 (aMSC, {h)ABC, {c) {AB C)\J {ABC)\J {A BC), (d)AU5UC, 
(e) {ABC) U {ABC), (f) ABC, (g) {AB) U {BC) U {CA), (h) ABC, 

(i) ABC _ _ 

2.9 (a) AAB, (b) AB U AB 
2.11 (a) 0.00829, (b) 0.00784, (c) 0.00829 

2.14 (a) 0.553, (b) 0.053, (c) 0.395 
2.16 0.9999 

2.18 (a) 0.8865, (b) [1 - (1 - pa){\ - Pc)1\ - (1 - Pb){\ - Pd)] 

2.20 No 

2.22 No, (a) P{A) = P{B) = 0.5, (b) Impossible 

2.23 Under condition of mutual exclusiveness: (a) false, (b) true, 

(c) false, (d) true, (e) false 

Under condition of independence: (a) true, (b) false, (c) false, 

(d) false, (e) true 

2.24 (a) Approximately 10^^, (b) Yes, (c) 0.00499 

2.26 (a)—^—, (b)-—— 

t I — IQ 

2.28 (a) 0.35, (b) 0.1225, (c) 0.65 
2.30 (a) 0.08, (b) 0.375 
2.32 (a) 0.351, (b) 0.917, (c) 0.25 
2.34 (a) 0.002, (b) 0.086, (c) 0.4904 
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CHAPTER 3 


3.1 (a)a=l, 


(c) a = 2, p{x) 
(e) a > 0, 



for X = 5 
elsewhere 


f ax° ', for 0 < X < 1 
\ 0, elsewhere 


or a = 0, 


f 1, for X = 0 
[ 0, elsewhere 


3.2 

3.4 


(g) a= 1/2, neither pdf nor pmf exists 

(a) 1,1/3,63/64,1 - e-‘>“, 1,1, (2 - e-h3)/2 

(b) 1,1,127/128, e-“/2 _ j _ ( 1 / 2 )“, l/2,(e-'/4 _ e-^/2)/2 

(a) 

( 0, for X < 90 

Fx{x) = < O.lx — 9, for 90 < x < 100 
1 1, for X > 100 


(b) 


(c) Fx{x) 

3.6 2/3 
3.9 


Fx{x) = 


-tan 

TT 


X + 


2 ’ 


0, for X < 0 

2x — x^, for 0 < X < 1 
1, for X > 1 

for —00 < X < 00 


Fx{x) 


fx{x) 


0, for X < 0 

Y, for 0 < X < h 
b 

1, for X > 6 

^, for 0 < X < h 
b 

0, elsewhere 


3.11 3/a 

3.12 (b) 1/6 
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3.13(a): (i) p^{x) 


0.6, for X = 1 . , ^ f 0.6, for j = 1 

0.4, for X = 2 ^ \ 0.4, for j = 2 


(iii) f t t ® for X > 0 r ( \ \ ^ for j > 0 

Jx\^l elsewhere elsewhere 


(b): (i) No, (hi) Yes 

3.17 [i^x(x)-Fx(100)]/[l -Fx(lOO)], x> 100 
3.19 (a) 0.087, (b) 0.3174, (c) 0.274 
3.22 0.0039 

0.016, for X = 1 

0.035, for X = 2 

0.080, for X = 3 

Py (x) = 0.125, for X = 4 

0.415, forx=5 
0.192, for X = 6 
*^0.137, forx=7 


(b) Table of Px.x^iU j) 


i 




j 





1 

2 

3 

4 

5 

6 

7 

1 

0.006 

0.004 

0.003 

0.003 

0.004 

0.000 

0.000 

2 

0.002 

0.009 

0.008 

0.005 

0.010 

0.002 

0.001 

3 

0.003 

0.008 

0.015 

0.014 

0.031 

0.008 

0.005 

4 

0.001 

0.004 

0.015 

0.027 

0.051 

0.017 

0.011 

5 

0.002 

0.007 

0.029 

0.054 

0.196 

0.075 

0.050 

6 

0.001 

0.002 

0.005 

0.015 

0.071 

0.060 

0.032 

7 

0.000 

0.001 

0.005 

0.008 

0.052 

0.030 

0.038 


CHAPTER 4 

4.1 (a) 5, 0; (c) 2, 2; (e) a/(a + 1), a/[(a + \)\a + 2)]; (g) 1, 3 

4.3 2.44 min 

4.6 (a) 1/2, (b) 2, 4; (c) 0, 1 
4.12 (a)(l-/7)/A, (b)l/A 
4.14 24 min 

4.16 P{\X — 1| < 0.75) > 0.41 by the Chebyshev inequality, P{\X — 1| < 0.75) 
= 0.75 

4.19 (a) P(55 < Y < 85) > 0 

(b) P(55 < X < 85) > 5/9, much more improved bound 
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4.20 3/4 
4.23 1.53 
4.25 

4.27 (a) mx, +mx^,(j\^ 

(b) (TX2l{o-x, + it approaches one if » a\^ 

4.28 n, In 

4.30 (a) i?xit) = 5, 0 

oo 1 ,, 

(c) (t)x(0= >2, 2 

k=\^ 


(e) 4’x(0 


A 1\ 2 2 1 

jt V jtj'^(jt)2’3’18 


CHAPTER 5 

5.1 (a) 

{ 0, for j < 8 

for 8 < j < 17 
1, forj>17 

r 0, for j < 8 

'P’r(j) = < for8<j<17 

[ 1, for j > 17 


5.3 


5.5 


5.9 


( 0, for j < -1 
J+1 


friy) = { 


9 ’ 
5-4 


for—1 < 4 < 2 
for 2 < j < 5 


I 0, for j > 5 


friy) 


- forj>0 

4(27r)‘/2 

0, elsewhere 


fx{x) 


2 

7r(a2 — 

0, elsewhere 


for 0 < X < a 
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5.10 (a) 

fwH 


0.19 

2 a{w/af^[ 36.6 ) 
0, elsewhere 


(way^Y'"' 

36.6 j 


for vr > 0 


mw = 1.71a X 10^, = 8.05a^ x 10^ 

(b) Same as (a) 

5.12 Fis discrete and 


Priy) 


fx{x)dx, 


fx{x)dx, 


for y = I 
for j = 0 


5.14 (a) 


fA{a) 


-for 47r(0.99ro)^ < a < 47r(1.01ro)^ 

0.08ro(7ra)‘/^ 

0, elsewhere 


(b) 

/f(v) 


1 


3v 


-2/3 


O.OSrrro j 
0, elsewhere 


for ^7r(0.99ro)^ < v < ^7r(1.01ro)^ 


5.16 (a) 


r2+j 

4 ’ 

f riy) = < 2 - >> 


for —2 < j < 0 
for 0 < j < 2 


10, elsewhere 


(b) Same as (a) 
5.21 


frif) 


{ai + a2 + ■ ■ ■ + a„)e i‘‘i+‘‘ 2 +-+a„)t^ for t > 0 
0, elsewhere 


5.23 fy(y) 
5.25 


fx 2 iX 2 )U'x,iX 2 + y) +fxSx2 - y)]dX2, -oo < J < oo 


friy) 


yQ y^!2^ fQ]- j; > 0 

0, elsewhere 
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5.27 


3j2 

fviy) = { (1 + f ) 


4 , for j > 0 


0, elsewhere 


5.29 


= { 27rcr2 

10, elsewhere 


e for r > 0, and —tt < </> < 


/ for r > 0 

10, elsewhere 

r r i\ ) 7^ , for —TT < d) < TT 

= <( 27r’ 

0, elsewhere 


R and $ are independent 


CHAPTER 6 


6.3 

6.5 

6.8 


(a) 0.237, (b) 3.75 
0.611, 4.2 

Wi-ECyo-rl"-* 

(b) ^ (k-m)Qp'^{\-p)' 

k=m-\-\ 


6.10 0.584 

-h'l' 


6.14 0.096 

6.17 (a) 0.349, (b) 5 

6.26 0.93 

6.28 1.4 X lO^^"* 

6.30 pxik) = 22.5^ e-22-5/yt!, ^ = 0,1,2,... 


6.32 p^(0, t) 



^exp {—fjw) 
k\ 


^= 0 , 1 , 2 ,... 
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CHAPTER 7 


7.1 0.847 

7.3 (a) 0.9, (b) 0.775 

7.6 (a) 4.566 x lO^^, (b) 0.8944, (c) 0.383, (d) 0.385 
7.9 X 2 is preferred in both cases 
7.14 0.0062 
7.20 (a) 0.221 


(b) 

friy) 


1 

- 17 ^- 

(0.294)(27r)*'^ {y — a) 



0, elsewhere 


my = a + b 


for j > a 


7.22 

7.30 

7.34 

7.36 


0.153 

(a) 0.056, (b) 0.989 

0.125, 0, 0, 0.875. No partial failure is possible 

/^(s) ^ I «(« - 1) I [Px{y) - Fx{y - s)f~'^fx{y - s)fx{y)<yy, 

10, elsewhere 


for s>Q 


CHAPTER 8 


8.2 (a) 

(i) 


(ii) 

(d) 

(i) 

(f) 

(i) 

(h) 

(i) 

a) 

(i) 

(i) 

(i) 


Type-1 asymptotic maximum-value distribution is suggested 
a = 0.025, u = 46.92 

(ii) A ^ 0.317 
(ii) V ^ 45.81 
(ii) m = 2860, a = 202.9 
(ii) V ^ 7.0 

(ii) ex ^ 76.2, 0.203 


Gamma is suggested, 
Poisson is suggested. 
Normal is suggested, 
Poisson is suggested. 
Lognormal is suggested. 


CHAPTER 9 

9.1 1.75, 27.96 

f lOj^, for 0 < y < 1 
i 0, elsewhere 
10(1-z)^ for0<z<l 
0, elsewhere 

(c) 0.091, 0.91 


friy) 

fz{A = 
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9.7 

9.11 

9.14 

9.15 

9.20 


9.22 

9.24 

9.26 


9.30 

9.32 

9.34 

9.36 

9.38 


01 is better 

It is biased, but unbiased as n ^ oo 
(a)6»2/«, (b)6»2/«, (c) 6»(1 - 6»)/«, (d) 6»/« 

cP’jn, 2(T^/« 


(a) 1 -1, 
X 

9 




k-\ 


^ML = -^(1), AmE = X — I 
A = 0.13 sec-' _ 

(a) Aml = Ame = (T —Jo) ' 

(b) foMr = 7’(l),foMB = 3^-(l/A) 

Aml = [r-Tfi)] '.r'o„L = 
Ame = (M2 - 


^(1) 

= T -{M2 


1/2 


(2ar'/2 

(a)/i ,2 = 63.65,81.55, (b)/ e 2 = 70.57,84.43, (c )2 = 77.74,89.46 
(a)9.’l6, (b)/ e 2 = 8.46,9.86 

(a) /i ,2 = 1072,1128, (b) /e 2 = 1340,6218 and h = 1478 
384 


CHAPTER 10 

10.1 More likely to be accepted at a = 0.01 
10.3 Hypothesis is accepted 
10.5 Hypothesis is accepted 
10.8 Poisson hypothesis is rejected 
10.10 Gamma hypothesis is accepted 
10.12 Normal hypothesis is accepted 
10.14 Poisson hypothesis is accepted 
10.16 Hypothesis is accepted 


CHAPTER 11 

11.1 (a) 





-1 
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(b) 


E{B} = ^, var{5} = a2 K]: 


!=1 


(C) 

s5 = ^j:(r,-Av,)= 

^ /-I 

1 n _ _ ^ 

11.3 Y? — -y^ (Yi — A — Bxif, hence biased. 

?=i 

11.7 14.98 + 32.14 logiov 

11.9 (a) a = 1.486 and /3 = 0.022, (b) l\ = 0.006, (c) Not significantly dif¬ 
ferent from zero 

11.11 /3o = 66.18, A = 0.42, A = 46.10 

11.13 A = 717.18, A = 10.84, A = -3.78, A = -1-57, A = 0.38 
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variance, 223, 237 
Bias, 265 

Binomial distribution, 43, 162, 182-183 
characteristic function, 164 
mean, 164, 184 

Poisson approximation, 182-183 
table, 365-366 
variance, 164, 184 
Boole’s inequality, 30 
Brownian motion, 106 

Cauchy distribution, 126 
Central limit theorem, 199-201 
Characteristic function, 98 
joint, 108 

Chebyshev inequality, 86-87 
Chi-squared distribution, 219-221, 236 
mean, 221, 236 
table, 371 
variance, 221, 236 
Chi-squared text, 316 
Coefficient of excess, 83 
Coefficient of skewness, 83 
Coefficient of variation, 81 
Computer software, 3, 375-377 
Confidence interval, 295, 296, 298, 302 
Confidence limit, 347 
Consistency, 274 
Correlation, 88-90 
perfect, 90 
zero, 90 


Correlation coefficient, 88-89 
Covariance, 88 
matrix, 93 

Cramer-Rao inequality, 267-270 
lower bound (CRLB), 269 
Cumulant, 101 

Cumulative distribution function see 
Probability distribution function 

£>2 distribution, 327 
table, 372 

De Morgan’s laws, 11-12 
Density function see Probability density 
function 

Distribution function, see Probability 
distribution function 

Efficiency, 270 
asymptotic, 271 
Error, 316 
type I, 316 
type II, 316 
Estimate, 264 
Estimator, 265 
consistent, 274 
efficient, 270,271 
sufficient, 275 

unbiased minimum-variance, 266 
Event, 12 
Excel 2000, 3 
Expectation, 75-76 
conditional, 83-85 
mathematical, 75 
operator, 75 

Exponential distribution, 45, 78, 215-219, 
236 

mean, 215, 236 
variance, 215, 236 
Exponential failure law, 218 
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Extreme-value distribution, 226, 
type I, 228, 237 
type II, 233, 237 
type III, 234, 237 

Failure rate, see Hazard function 
Fisher-Neyman factorization criterion, 275 
Frequency diagram, 248 
Function of random variables, 119, 137 
moments, 134 

probability distributions, 120 

Gamma distribution, 212-215, 236 
mean, 213, 236 
variance, 213, 236 
Gaus-Markov theorem, 345 
Gaussian distribution, see Normal 
distribution 

Geometric distribution, 167, 184 
mean, 168, 184 
variance, 168, 184 
Gumbel’s extreme value, 228 
distribution, 228 

Hazard function, 218 
Histogram, 248 
cumulative, 327 

Hypergeometric distribution, 167, 184 
mean, 184 
variance, 184 

Hypothesis testing, see Test of hypothesis 

Independence, 19-20 
mutual, 18 
Interarrival time, 215 

Jacobian, 149 

Kolmogorov-Smirnov test, 327 

Law of large numbers, 96 
Least-square estimator, 354—355 
covariance, 356 

linear unbiased minimum variance, 344 
mean, 355 
variance, 355 
Likelihood equation, 288 
Likelihood function, 288 
Linear regression, 335 
multiple, 354 
other models, 357 
simple, 335 
variance, 343 


Lognormal distribution, 209-212, 236 
mean, 211, 236 
variance, 211, 236 

MacLaurin series, 99 
Markovian property, 27 
Markov’s inequality, 115 
Mass function, see Probability mass function 
Maximum likelihood estimate, 288 
Maximum likelihood estimator, 288-289 
consistency, 289 
efficiency, 289 
invariance property, 290 
Mean, 76-77 
conditional, 84 
Median, 76 
Mode, 78 
Moment, 76, 78 
central, 79 
joint, 87 
joint central, 87 
Moment estimate, 278 
Moment estimator (ME), 278-280 
combined, 284 
consistency, 279 

Moment-generating function, 112, 117 
Multinomial distribution, 172, 184 
covariance, 173 
mean, 173, 184 
variance, 173, 184 
Mutual exclusiveness, 13 

Negative binomial distribution, 169, 184 
mean, 171, 184 
variance, 171, 184 

Normal distribution, 107, 196-199, 236 
bivariate. 111 

characteristic function, 198 
mean, 198, 236 
multivariate, 205 
standardized, 201 
table, 369 
variance, 198, 236 
Normal equation, 338 
Nuisance parameter, 284 

Parameter estimation, 259 
interval estimation, 294-295 
maximum likelihood method, 287 
moment method, 278 
point estimation, 277 


TLFeBOOK 



Subject Index 


391 


Pascal distribution, see Negative binomial 
distribution 

Poisson distribution, 173-176, 184 
mean, 176, 184 
table, 367 
variance, 176, 184 
Population, 259 
Probability, 13 
assignment, 16, 17 
conditional, 20-21 
function, 13 
measure, 13 

Probability density function (pdf), 44^6 
conditional, 62-63 
joint (jpdf), 49-51 
marginal, 57 

Probability distribution function (PDF), 39-41 
bivariate, 49 
conditional, 61 
joint (JPDF), 49-51 
marginal, 50 
mixed-type, 46 

Probability mass function (pmf), 41, 43 
conditional, 61 
joint (jpmf), 51-55 
marginal, 52 

Random experiment, 12 
Random sample, see Sample 
Random variable, 37-39 
continuous, 38 
discrete, 38 
function of, 120 
sum of, 145 
Random vector 
Random walk, 52 
Range space, 120 
Regression coefficient, 336 
confidence interval, 347 
least-square estimate, 344 
test of hypothesis, 316 
Relative likelihood, 16-17 
Reliability, 60, 218 
Residual, 337 
Return period, 169 

Sample, 259 
size, 260 
value, 260 

Sample mean, 97, 261 
mean, 261 
variance, 261 


Sample moment, 263-264 
Sample point, 12 
Sample space, 12 
Sample variance, 262-263 
mean, 262 
variance, 262 
Schwarz inequality, 92 
Set, 8-12 
complement of, 9 
countable (enumerable), 8 
disjoint, 10 
element, 8 
empty, 9 
finite, 8 
infinite, 8 
subset of, 8 

uncountable (nonenumerable), 8 
Set operation, 9-12 
difference, 10 
intersection (product), 10 
union (sum), 9 
Significance level, 319 
Spreadsheet, 3 
Standard deviation, 79-81 
Statistic, 260 
sufficient, 275 

Statistical independence, see 
Independence 
Sterling’s formula, 107 
Student’s f-distribution, 298-299 
table, 370 

Sum of random variables, 93, 
145-146 

characteristic function, 104—105 
moment, 94 

probability distribution, 106, 146 

Test of hypothesis, 316 
Total prohability theorem, 23 
Tree diagram, 27-28 

Unbiasedness, 265 
Uniform distribution, 57, 189, 236 
bivariate, 193 
mean, 192, 236 
variance, 192, 236 
Unimodal distribution, 79 

Variance, 79, 82 
Venn diagram, 9 

Weibull distribution, 235 
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